Applications of fractal and/or chaotic techniques

ABSTRACT

This invention relates to the application of techniques based upon the mathematics of fractals and chaos in various fields including document verification, data encryption and weather forecasting. The invention also relates, in one of its aspects, to image processing.

[0001] This invention relates to the application of techniques based upon the mathematics of fractals and chaos in various fields including document verification, data encryption. The invention also relates, in one of its aspects to image processing.

[0002] For convenience, the description which follows is divided into five sections each relating to a respective aspect or set of aspects of the invention.

SECTION 1

[0003] Making Money From Fractals and Chaos: Microbar™

[0004] Introduction

[0005] We are all accustomed to the use of bar coding which was first introduced in the late 1960s in California and has grown to dominate commercial transactions of all types and sizes. Microbar™ is a natural extension of the idea but with some important and commercially viable subtleties that are based on the application of fractal geometry and chaos.

[0006] The origins of Microbar go back to the mid 1990s and like all good ideas, were based on asking the right questions at the right time: Instead of using 1D bar codes why not try 2D dot codes? One of the reasons for considering this simple extension was due to the dramatic increase in the number of products that required bar code tagging. Another, more important reason, concerned the significant increase in counterfeit products.

[0007] Bar Codes

[0008] Product numbering or bar coding in the UK is the responsibility of the e-centre UK who issue unique bar codes for different products. The e-centre UK was a founder member of the European Article Numbering (EAN) Association, which is now known as EAN International. The EAN system was developed in 1976, following on from the success of an American system which was adopted as an industry standard in 1973. EAN tags are unique and unambiguous, and can identify any item anywhere in the world. These numbers are represented by bar codes which can be read by scanners throughout the supply chain, providing accurate information for improved management. As the number of products increases, so the number of bits required to represent a product uniquely must increase. The EAN system has recently introduced a new 128 bit bar-code (the EAN-128) to provide greater information on a larger diversity of products. They are used on traded units; retail outlets use a EAN-18 bar code.

[0009] Microbar's Origins

[0010] Compared with a conventional bar code, a Microbar serves two purposes: (i) converting from a 1D bar code to a 2D dot code provides the potential for greater information density; (ii) this information can be embedded into the product more compactly making it more difficult to copy.

[0011] In the early stages of Microbar's development, it was clear that a conventional laser scanning system would have to be replaced by a specialist reader—instead of scanning a conventional bar code with a “pencil line” laser beam, an image reader/decoder (hand-held or otherwise) would need to be used. The original idea evolved from the laser speckle coding techniques used to authenticate the components of nuclear weapons. It was developed by Professor Nick Phillips (Director of the Centre for Modern Optics at De Montfort University) and by Dr William Johnson (Chief Executive of Durand Technology Limited) and focused on the anti-counterfeiting market. It was based on a 2D dot code formed from a matrix of micro-reflectors. When exposed to laser light, a CCD camera records the scattered intensity from which the pattern is recovered (via suitable optics and appropriate digital image processing). The micro-reflectors (which look-like white dots in a black background) are embedded into a tiny micro-foil which is then attached to the product as a micro-label. The pattern of dots is generated by implementing a pseudo random number generator and binarizing the output to give a so called stochastic mask. This mask is then burnt into a suitable photopolymer. (Its a bit like looking at “cats eyes” on the road when driving in the dark, except that instead of being placed at regular intervals along the centre of the road, they are randomly distributed all over it.) The “seed” used to initiate the random number generator and the binarization threshold represent the “keys” used for identifying the product. If the stochastic mask for a given product correlates with the template used in the identification processes, then the product is passed as being genuine.

[0012] As always, good ideas suffer from technical, bureaucratic and capital investment problems (especially in the UK). In this case the main problem has been the high cost of introducing an optical Microbar into security documents and labels and the specialist optical readers/decoders required to detect and verify the codes. An additional problem is that counterfeiters are not stupid! Indeed, some of the best ideas for anti-counterfeiting technology along with methods of encryption, computer virus algorithms, hacking, cracking and so on are products of the counterfeit/criminal mind whose ideas often transcend those of an established authority. Whatever is put onto a label or at least, is seen to be on it, can in principle be copied (if enough effort is invested). For example, the holograms that are commonly used on debit and credit cards, software licensing agreements and on the new twenty pound note are relatively easy targets for counterfeiters. Furthermore, contrary to public opinion, such holograms convey no information whatsoever about the authentication of the product. As long as it looks right, its all right. Thus, although the optical Microbar could in principle provide a large amount of information pertinent to a given product, it was still copyable. What was required was a covert equivalent.

[0013] In Comes Russia

[0014] In 1996, De Montfort University won a prestigious grant from the Defence Evaluation and Research Agency at Malvern (formerly the royal Signals and Radar Establishment) to investigate novel methods of encryption and covert technology for digital communication systems (including radio, microwave and ATM networks). The aim was to develop a new digital Enigma-type machine based on the applications of fractals and chaos. This grant was (and is) unique in that it was awarded on the basis of employing a number of Research Assistants (mathematicians, computer scientists and engineers) from the Moscow State Technical University (MSTU). Since the end of the cold war, De Montfort University has had a long standing Memorandum of Agreement with MSTU—a university whose graduates include some of the great names in Russian science and engineering, including the aerodynamicist Tupolev and the inventor of Russian Radar and the current Vice Chancellor, Professor Federov. As expressed at the time by all concerned, if we had previously suggested that one day, young Russian scientists wkould be employed in the UK, financed by HMS government working on state of the art military communications systems, than off to hospital we would have gone!

[0015] One of the projects was based on using random scaling fractals to code bit streams. The technique, which later came to be known as Fractal Modulation, worked on the same principles as Frequency Modulation; instead of transmitting a coded bit stream by modulating the frequency of a sine wave generator, the fractal dimension of a fractal noise generator is modulated. In addition to spread spectrum and direct sequencing, Fractal Modulation provides a further covert method of transmission with the aim of making the transmitted signal “look like” background noise. Not only does the enemy not know what is being said (as a result of bit stream coding) but is not sure whether a transmission is taking place. As the project developed, it was realised that if a 2D bit map was considered instead of a 1D bit stream, then an image could be created which “looked like” noise but actually had information embedded in it. The idea evolved of introducing a technique that has a synergy with the conventional electronic water mark (commonly used in the transmission of digital images) and fractal camouflage but is more closely related to a Microbar where a random bit map is converted into a map of fractal noise. Thus, the Microbar evolved from being a stochastic mask composed of micro-reflectors implemented using laser optics to a “stochastic agent” used to encode information in a covert way using digital technology. That was the idea. Getting it to work using conventional printing and scanning technology has taken time but was done in the knowledge that specialist optical devices and substrate's would not be required and that a working system could be based on existing digital printer/reader technology as used by all the major security document printing companies.

[0016] Why Does It Work?

[0017] The digital Microbar system is a type of Steganography in which secret codes are introduced into an image without appearing to change it. For example, suppose you send an innocent memorandum discussing the weather or something, which you know will be intercepted. A simple code can be introduced by putting pin holes through appropriate letters in the text. Taking these letters from the text in a natural or pre-arranged order will allow the recipient of the document to obtain a coded message (providing of course, the interceptor does not see the pin holes and wonder why they are there!). Microbar technology uses a similar idea but makes the pin holes vanish (well sort of), using a method that is based on the use of self-affine stochastic fields.

[0018] Suppose you are shown two grey level images of totally different objects (a face and a house for example) but whose distribution in grey levels is exactly the same. If you were asked the question, are the images the same? then your answer will be “no”. If you were asked whether the images are statistically the same, your answer might be “I don't know” or “in what sense?” When we look at an image, our brain attempts to interpret it in terms of a set of geometric correlations with a library of known templates (developed from birth), in particular, information on the edges or boundaries of features which are familiar to us. It is easy to confuse this form of neural image processing by looking at pictures of objects that do not conform to our perception of the world—the Devil's triangle or Escher's famous lithograph “ascending and descending” for example. Thus, our visual sense is based (or has developed) on correlations that conform to a Euclidean perspective of the world. Imagine that our brain interpreted images through their statistics alone. In this case, if you were given the two images discussed above and asked the same question, you would answer “yes”. Suppose then that we construct two images of the same object but modify the distribution of grey levels of one of them in such a way that our (geometric) interpretation of the images is the same. Further, add colour into the “equation” in which the red, green and blue components can all have different statistics and it is clear that we can find many ways of confusing the human visual system because it is based on a Euclidean geometric paradigm with colour continuity. Moreover, construct an image which has all these properties but in addition, is statistically self-affine so that as we zoom into the image, the distribution of its RGB components are the same. Without going into the details of the encryption and decoding processes (which remain closed anyway), these are some of the basic principles upon which the current Microbar system works. In short, a Microbar introduces a stochastic agent into a digital image (encryption) which has three main effects: (i) it changes the statistics of the image without changing the image itself (covert); (ii) these statistics can be confirmed (or otherwise) at arbitrary scales (fractals); (iii) any copy made of the image introduces different statistics since no copy can be a perfect replica (anti-counterfeiting). Point (iii) is the reason why the Microbar can detect copies. Point (ii) is the reason why detection does not have to be done by a high resolution (slow) reader and point (i) is why it can't be seen. There is one further and important variation on a theme. By embedding a number of Microbar's into a printed document at different(random) locations, it is possible to produce an invisible code (similar to the “pin holes” idea discussed at the start of this Section). This code (i.e. the Microbars” coordinates) can be generated using a standard or preferably non-standard encryption algorithm whose key(s) are related via another encryption algorithm to the serial number(s) of the document or bar code(s). In the case of non-standard encryption algorithms, chaotic random number generation is used instead of conventional pseudo random number generation. For each aspect of the Microbar “secrets” discussed above, there are many refinements and adjustments required to get the idea to work in practice which depend on the interplay between the digital printer technology available, reader specifications, cost and encryption hierarchy (related to the value of the document to be encrypted).

[0019] Current State of Play

[0020] Introducing stochastic agents into printed or electronically communicated information has a huge number of applications. The commercial potential of Microbar™ was realised early on. As a result, a number of international patents have been established and a new company “Microbar Security Limited” set up in partnership with “Debden Security Printing”—the commercial arm of the Bank of England, where a new cryptography office has been established and where the “keys” associated with the whole process for each document can be kept physically and electronically secure. In June this year, Microbar was demonstrated for the first time publicly at the “First World Product and Image Security Convention” held in Barcelona. The demonstration was based on a Microbar encryption of a bank bond and a COTS (Commercial Of The Shelf) system developed to detect and decode the Microbar. The unveiling of this demonstration prototype has led to a number of contracts with leading security printing company's in the UK, USA, Germany, Russia and the Far East. One of the reasons for starting at the top (i.e. with very high value documents—bank bonds) was due to the fact that a major contribution to the decline of the Russian economy last year related to a rapid increase in the exchange of counterfeit Russian bank bonds. The IMF requested that the Federal Bank of Russia reduce the quantity of Roubles being printed in late 1997, a request which was agreed to, but traded-off by an increase in the production of bank bonds (this will not happen again with Microbar™).

[0021] The Future

[0022] The use of Microbar™ in the continuing battle against forgery will be of primary importance over the next few years. With the increased use of anti-counterfeit features for currency, Microbar represents a general purpose technology which can and should be used in addition to other techniques that include the use of fluorescent inks, foil holograms, optical, infrared and thermal watermarks, phase screens, enhanced paper/print quality, microprinting and so on. However, one of the most exiting prospects for the future is in its application to Smartcard technology and e-commerce security. As an added bonus, the theoretical models used to generate and process Microbar encrypted data are being adapted to analyse financial data and to develop a new and robust macro-economic volatility prediction metric. Thus, in the future, Microbar™ may not only be used to authenticate money but to help money keep its value!

[0023] Finally, self-affine data analysis is currently being applied to medicine. Early trials have shown that epedemiological time series data is statistically self-affine, irrespective of the type of disease. This may lead to new relationships between the study of health in terms of cause and effect. This approach—called Medisine™—will be of significant value in the analysis of health case and government expenditure in the next millenium.

[0024] In the above description, the references to “1D” and “2D” are, of course, abbreviations for one-dimensional (referring to a linear arrangement or series of marks) and two-dimensional (referring to an array of marks, e.g. on a flat sheet distributed in two perpendicular directions on the sheet, for example). respectively. The use of random scaling factors, fractal statistics, and the term “self-affine”, inter alia, are discussed in more detail in WO99/17260 which is incorporated herein by reference.

[0025] In the present specification, “comprise” means “includes or consists of” and “comprising” means “including or consisting of”.

[0026] The features disclosed in the foregoing description, or the following claims, or the accompanying drawings, expressed in their specific forms or in terms of a means for performing the disclosed function, or a method or process for attaining the disclosed result, as appropriate, may, separately, or in any combination of such features, be utilised for realising the invention in diverse forms thereof.

SECTION 2

[0027] Anti-Counterfeiting and Signature Verification System

[0028] This invention relates to an anti-counterfeiting and signature verification system, and to components thereof The invention is particularly, but not exclusively, applicable to credit and debit cards and the like.

[0029] A typical credit or debit card has currently, on a reverse side of the card, a magnetic stripe adapted to be read by a magnetic card reader and a stripe of material adapted to receive the bearer's signature, executed generally by ballpoint pen in oil-based ink. The last-noted stripe, herein referred to, for convenience, as the signature stripe, may be pre-printed with a pattern or wording so as to make readily detectable any erasure of a previous signature and substitution of a new signature on a card which has been stolen. For the same reason, the signature stripe normally comprises a thin coating of a paint or plastics material covering wording, (such as “VOID”), on the card substrate, so that any attempt to remove the original signature by scraping the top layer off the signature stripe with a view to substituting the criminal's version of the legitimate card bearer's signature is likely to remove the stripe material in its entirety, leaving the underlying wording exposed to view. Whilst these measures safeguard against the more inept attempts to substitute signatures on stolen credit or debit cards, they are less effective against better-equipped criminals who may possess, or have access to, equipment capable of, for example, removing original signature stripes in their entirety and applying fresh signature stripes printed with a counterfeit copy of any pre-printed marking or wording originally present and which cards may be then be supplied to criminals who can “sign” the cards and subsequently use them fraudulently

[0030] It is among the objects of the invention to provide a system and components of such system, which will prevent, or at least render more difficult, criminal activities of the type discussed above.

[0031] According to the invention, there is provided a document, card, or the like, having an area adapted to receive a signature or other identifying marking, and which bears a two-dimensional coded marking adapted for reading by a complementary automatic reading device.

[0032] Preferably, the complementary automatic reading device includes means for detecting, from a perceived variation in such coding resulting from subsequent application of a signature, whether such signature corresponds with a predetermined authentic signature. The term “corresponds” in this context may signify an affirmative outcome of a more or less complex comparison algorithm adapted to accept as authentic signatures by the same individual who executed the predetermined signature, but to reject forged versions of such signatures executed by other individuals.

[0033] The two-dimensional coded marking referred to above may take the form referred to, for convenience, as “Microbar” in the Appendix forming part of this specification and may be a fractal coded marking of the kind disclosed in W099/17260, which is incorporated herein by reference.

[0034] In a preferred embodiment of the invention, as applied for example, to a credit card or debit card, a signature stripe on the card, as provided by the issuing bank or other institution, carries, as a unique identification, a two dimensional coded marking of the type referred to as “Microbar” in the annex hereto, which can be read by a complementary reading device which can determine on the basis of predetermined decryption algorithms not only the authenticity of the marking but also the unique identity thereof, (i.e. the device can ascertain, from the coded marking, the identity of the legitimate bearer, his or her account number, and other relevant details encoded in the marking). The complementary reading device will, it is envisaged, normally be an electrically operated electronic device with appropriate microprocessor facilities, thereading device being capable of communication with a central computing and database facility at premises of the bank or other institution issuing the card. The coding on the signature stripe is preferably statistically fractal in character (c.f. WO99/17260), with the advantage that minor damage to the stripe, such as may be occasioned by normal “wear and tear” will not prevent a genuine signature stripmarking being detected as genuine nor prevent the identification referred to.

[0035] It will be understood that the writing of a signature on the signature strip has the potential to alter the perception of the coded marking by the complementary reading device. However, because of the fractal nature of the coded marking, (or otherwise, because an appropriate measure of redundancy is incorporated in the marking, the application of a signature to the signature stripe does not, any more than the minor wear and tear damage referred to above, prevent identification of the marking by the reading device nor derivation of the information as to the identity of the legitimate card bearer, etc. Nevertheless, the reader and, more particularly, the associated data processing means, is arranged inter aha to execute predetermined algorithms to determine whether the effect of the signature on the signature stripe it has read is an effect attributable to the signature of the legitimate card bearer or is an effect indicative of some other marking, such as a forged signature applied to the signature strip. The reading device makes this determination by reference to data already held, e.g. at the central computing and database facility, relating to the signature of the legitimate card bearer, (for example derived from analysis of several sample signatures of the legitimate card bearer, applied to signature areas of base documents, bearing corresponding two-dimensional coded markings. The reading device may, in effect, subtract, from the pre-applied coded marking, the effects of a legitimate card bearer's signature and determine whether the result is consistent with the original, virgin, coded signature stripe. This procedure, assisted by the high statistical information density of the “Microbar” marking and the complexity of the statistical data in such marking, should actually prove simpler and more reliable than known automated signature recognition procedures. This increased simplicity and reliability may be attributable to a species of what is termed mathematically as “stochastic resonance”.

[0036] Thus, in preferred embodiments of the invention, not only is it possible for a credit card or debit card, for example, to carry in unobtrusively encrypted form not readily reproducible by a counterfeiter, but readily readable by the appropriate reading device, information identifying the legitimate user of the card, such as his account number, but it is possible for the reading device to verify the authenticity of the signature on the card.

[0037] In another embodiment of the invention, there is provided a credit or debit card or the like in which an image of the card bearer's signature is printed on the card by the bank or other issuing institution, being for example an image of a sample signature provided by the bearer to the bank when the relevant account was opened. The surface of the card bearing such image may, for example, be covered by a transparent resin layer, making undetected interference with the image virtually impossible. In this case the “Microbar” coding on the card may also be incorporated in the black markings which form the signature as well as on the surrounding area of the card, so that, for example, the signature on the card can have the same statistical fractal identity as the remainder, and can at any rate form part of the overall coded marking of the card. In general, where a signature is to be checked locally, e.g. at a point of sale, for authenticity, it may be appropriate to ensure that the area where the “test” signature is to be written, e.g. on a touch sensitive panel, should be of the same size and shape as an area to which the original “sample” signature was limited so that the person signing at the point of sale is placed under the same constraints as he was under when supplying the “sample” signature. The automatic signature reader can then be arranged to be sensitive to different effects such constraints may have on different persons so as to be even more likely to detect forgery.

[0038] In yet another embodiment, there may be no coded marking in the black lines forming the signature, but the remainder of the panel or area on the card receiving the printed signature has controlled fractal noise added in such a way that, whatever the signature, the signature panel, as a whole, of any card of the same type, has the same fractal statistics, and as a result, an automatic a card reader can check for authenticity simply by checking that the fractal statistics of the signature panel as a whole correspond to a predetermined set of such statistics. Many variations on this theme are possible. Thus, for example, the signature panel on the card may be sub-divided, notionally, into sub-panels, (the sub-panels would not necessarily be visible), with thefractal noise in the non-black portions of each sub-panel being adjusted to ensure that each sub-panel has the same fractal statistics, or has fractal statistics which are pre-determined for that sub-panel position.

ANNEX

[0039] Introduction

[0040] We are all accustomed to the use of bar coding which was first introduced in the late 1960s in California and has grown to dominate commercial transactions of all types and sizes. Microbar™ is a natural extension of the idea but with some important and commercially viable subtleties that are based on the application of fractal geometry and chaos.

[0041] The origins of Microbar™ go back to the mid 1990s and like all good ideas, were based on asking the right questions at the right time: Instead of using 1D bar codes why not try 2D dot codes? One of the reasons for considering this simple extension was due to the dramatic increase in the number of products that required bar code tagging. Another, more important reason, concerned the significant increase in counterfeit products.

[0042] Bar Codes

[0043] Product numbering or bar coding in the UK is the responsibility of thee-centre UK who issue unique bar codes for different products. The e-centre UK was a founder member of the European Article Numbering (EAN)Association, which is now known as EAN International. The EAN system was developed in 1976, following on from the success of an American system which was adopted as an industry standard in 1973. EAN tags are unique and unambiguous, and can identify any item anywhere in the world. These numbers are represented by bar codes which can be read by scanners throughout the supply chain, providing accurate information for improved management. As the number of products increases, so the number of bits required to represent a product uniquely must increase. The EAN system has recently introduced a new 128 bit bar-code (the EAN-128) to provide greater information on a larger diversity of products. They are used on traded units; retail outlets use a EAN-18 bar code.

[0044] Microbar's Origins

[0045] Compared with a conventional bar code, a Microbar serves two purposes: (i) converting from a 1D bar code to a 2D dot code provides the potential for greater information density; (ii) this information can be embedded into the product more compactly making it more difficult to copy.

[0046] In the early stages of Microbar's development, it was clear that a conventional laser scanning system would have to be replaced by a specialist reader—instead of scanning a conventional bar code with a “pencil line” laser beam, an image reader/decoder (hand-held or otherwise) would need to be used. The original idea evolved from the laser speckle coding techniques used to authenticate the components of nuclear weapons. It was developed by Professor Nick Phillips (Director of the Centre for Modern Optics at De Montfort University) and by Dr William Johnson (Chief Executive of Durand Technology Limited) and focused on the anti-counterfeiting market. It was based on a 2D dot code formed from a matrix of micro-reflectors. When exposed to laser light, a CCD camera records the scattered intensity from which the pattern is recovered (via suitable optics and appropriate digital image processing). The micro-reflectors (which look-like white dots in a black background) are embedded into a tiny micro-foil which is then attached to the product as a micro-label. The pattern of dots is generated by implementing a pseudo random number generator and binarizing the output to give a so called stochastic mask. This mask is then burnt into a suitable photopolymer. (Its a bit like looking at “cats eyes” on the road when driving in the dark, except that instead of being placed at regular intervals along the centre of the road, they are randomly distributed all over it.) The “seed” used to initiate the random number generator and the binarization threshold represent the “keys” used for identifying the product. If the stochastic mask for a given product correlates with the template used in the identification processes, then the product is passed as being genuine.

[0047] As always, good ideas suffer from technical, bureaucratic and capital investment problems (especially in the UK). In this case the main problem has been the high cost of introducing an optical Microbar into security documents and labels and the specialist optical readers/decoders required to detect and verify the codes. An additional problem is that counterfeiters are not stupid! Indeed, some of the best ideas for anti-counterfeiting technology along with methods of encryption, computer virus algorithms, hacking, cracking and so on are products of the counterfeit/criminal mind whose ideas often transcend those of an established authority. Whatever is put onto a label or at least, is seen to be on it, can in principle be copied (if enough effort is invested). For example, the holograms that are commonly used on debit and credit cards, software licensing agreements and on the new twenty pound note are relatively easy targets for counterfeiters. Furthermore, contrary to public opinion, such holograms convey no information whatsoever about the authentication of the product. As long as it looks right, its all right. Thus, although the optical Microbar could in principle provide a large amount of information pertinent to a given product, it was still copyable. What was required was a covert equivalent.

[0048] In Comes Russia

[0049] In 1996, De Montfort University won a prestigious grant from the Defence Evaluation and Research Agency at Malvern (formerly the royal Signals and Radar Establishment) to investigate novel methods of encryption and covert technology for digital communication systems (including radio, microwave and ATM networks). The aim was to develop a new digital Enigma-type machine based on the applications of fractals and chaos. This grant was (and is) unique in that it was awarded on the basis of employing a number of Research Assistants (mathematicians, computer scientists and engineers) from the Moscow State Technical University (MSTU). Since the end of the cold war, De Montfort University has had a long standing Memorandum of Agreement with MSTU—a university whose graduates include some of the great names in Russian science and engineering, including the aerodynamicist Tupolev and the inventor of Russian Radar and the current Vice Chancellor, Professor Federov. As expressed at the time by all concerned, if we had previously suggested that one day, young Russian scientists wkould be employed in the UK, financed by HMS government working on state of the art military communications systems, than off to hospital we would have gone!

[0050] One of the projects was based on using random scaling fractals to code bit streams. The technique, which later came to be known as Fractal Modulation, worked on the same principles as Frequency Modulation; instead of transmitting a coded bit stream by modulating the frequency of a sine wave generator, the fractal dimension of a fractal noise generator is modulated. In addition to spread spectrum and direct sequencing, Fractal Modulation provides a further covert method of transmission with the aim of making the transmitted signal “look like” background noise. Not only does the enemy not know what is being said (as a result of bit stream coding) but is not sure whether a transmission is taking place. As the project developed, it was realised that if a 2D bit map was considered instead of a 1D bit stream, then an image could be created which “looked like” noise but actually had information embedded in it. The idea evolved of introducing a technique that has a synergy with the conventional electronic water mark (commonly used in the transmission of digital images) and fractal camouflage but is more closely related to a Microbar where a random bit map is converted into a map of fractal noise. Thus, the Microbar evolved from being a stochastic mask composed of micro-reflectors implemented using laser optics to a “stochastic agent” used to encode information in a covert way using digital technology. That was the idea. Getting it to work using conventional printing and scanning technology has taken time but was done in the knowledge that specialist optical devices and substrate's would not be required and that a working system could be based on existing digital printer/reader technology as used by all the major security document printing companies.

[0051] Why Does It Work?

[0052] The digital Microbar system is a type of Steganography in which secret codes are introduced into an image without appearing to change it. For example, suppose you send an innocent memorandum discussing the weather or something, which you know will be intercepted. A simple code can be introduced by putting pin holes through appropriate letters in the text. Taking these letters from the text in a natural or pre-arranged order will allow the recipient of the document to obtain a coded message (providing of course, the interceptor does not see the pin holes and wonder why they are there!). Microbar technology uses a similar idea but makes the pin holes vanish (well sort of), using a method that is based on the use of self-affine stochastic fields.

[0053] Suppose you are shown two grey level images of totally different objects (a face and a house for example) but whose distribution in grey levels is exactly the same. If you were asked the question, are the images the same? then your answer will be “no”. If you were asked whether the images are statistically the same, your answer might be “I don't know” or “in what sense?” When we look at an image, our brain attempts to interpret it in terms of a set of geometric correlations with a library of known templates (developed from birth), in particular, information on the edges or boundaries of features which are familiar to us. It is easy to confuse this form of neural image processing by looking at pictures of objects that do not conform to our perception of the world—the Devil's triangle or Escher's famous lithograph “ascending and descending” for example. Thus, our visual sense is based (or has developed) on correlations that conform to a Euclidean perspective of the world. Imagine that our brain interpreted images through their statistics alone. In this case, if you were given the two images discussed above and asked the same question, you would answer “yes”. Suppose then that we construct two images of the same object but modify the distribution of grey levels of one of them in such a way that our (geometric) interpretation of the images is the same. Further, add colour into the “equation” in which the red, green and blue components can all have different statistics and it is clear that we can find many ways of confusing the human visual system because it is based on a Euclidean geometric paradigm with colour continuity. Moreover, construct an image which has all these properties but in addition, is statistically self-affine so that as we zoom into the image, the distribution of its RGB components are the same. Without going into the details of the encryption and decoding processes (which remain closed anyway), these are some of the basic principles upon which the current Microbar system works. In short, a Microbar introduces a stochastic agent into a digital image (encryption) which has three main effects: (i) it changes the statistics of the image without changing the image itself (covert); (ii) these statistics can be confirmed (or otherwise) at arbitrary scales (fractals); (iii) any copy made of the image introduces different statistics since no copy can be a perfect replica (anti-counterfeiting). Point (iii) is the reason why the Microbar can detect copies. Point (ii) is the reason why detection does not have to be done by a high resolution (slow) reader and point (i) is why it can't be seen. There is one further and important variation on a theme. By embedding a number of Microbar's into a printed document at different(random) locations, it is possible to produce an invisible code (similar to the “pin holes” idea discussed at the start of this Section). This code (i.e. the Microbars” coordinates) can be generated using a standard or preferably non-standard encryption algorithm whose key(s) are related via another encryption algorithm to the serial number(s) of the document or bar code(s). In the case of non-standard encryption algorithms, chaotic random number generation is used instead of conventional pseudo random number generation. For each aspect of the Microbar “secrets” discussed above, there are many refinements and adjustments required to get the idea to work in practice which depend on the interplay between the digital printer technology available, reader specifications, cost and encryption hierarchy (related to the value of the document to be encrypted).

[0054] Current State of Play

[0055] Introducing stochastic agents into printed or electronically communicated information has a huge number of applications. The commercial potential of Microbar™ was realised early on. As a result, a number of international patents have been established and a new company “Microbar Security Limited” setup in partnership with “Debden Security Printing”—the commercial arm of the Bank of England, where a new cryptography office has been established and where the “keys” associated with the whole process for each document can be kept physically and electronically secure. In June this year, Microbar was demonstrated for the first time publicly at the “First World Product and Image Security Convention” held in Barcelona. The demonstration was based on a Microbar encryption of a bank bond and a COTS (Commercial Of The Shelf) system developed to detect and decode the Microbar. The unveiling of this demonstration prototype has led to a number of contracts with leading security printing company's in the UK, USA, Germany, Russia and the Far East. One of the reasons for starting at the top (i.e. with very high value documents—bank bonds) was due to the fact that a major contribution to the decline of the Russian economy last year related to a rapid increase in the exchange of counterfeit Russian bank bonds. The IMF requested that the Federal Bank of Russia reduce the quantity of Rubles being printed in late 1997, a request which was agreed to, but traded-off by an increase in the production of bank bonds (this will not happen again with Microbar™).

[0056] The Future

[0057] The use of Microbar™ in the continuing battle against forgery will be of primary importance over the next few years. With the increased use of anti-counterfeit features for currency, Microbar represents a general purpose technology which can and should be used in addition to other techniques that include the use of fluorescent inks, foil holograms, optical, infrared and thermal watermarks, phase screens, enhanced paper/print quality, micro printing and so on. However, one of the most exiting prospects for the future is in its application to Smartcard technology and e-commerce security. As an added bonus, the theoretical models used to generate and process Microbar encrypted data are being adapted to analyse financial data and to develop a new and robust macro-economic volatility prediction metric. Thus, in the future, Microbar™ may not only be used to authenticate money but to help money keep its value!

[0058] Finally, self-affine data analysis is currently being applied to medicine. Early trials have shown that epidemiological time series data is statistically self-affine, irrespective of the type of disease. This may lead to new relationships between the study of health in terms of cause and effect. This approach—called Medisine™ —will be of significant value in the analysis of health case and government expenditure in the next millenium.

SECTION 3

[0059] Data Encryption and Modulation Using Fractals and Chaos

[0060] This invention relates to encryption and to data carriers, communication systems, document verification systems and the like embodying a novel and improved encryption method.

[0061] Encryption methods are known in which encrypted data takes the form of a pseudo-random number sequence generated in accordance with a predetermined algorithm operating upon a seed value and the data to be encrypted.

[0062] In accordance with the present invention, however, by a replacement of a standard algorithm that generates the encryption field (R₁, R₂, . . . R_(N)) with a chaotic algorithm, a greater level of security can be developed. In preferred embodiments of the invention, in addition, by using different classes of chaoticity at different times the level of security can be increased through what is in effect the introduction of non-stationary chaoticity. The nature of the invention in its preferred embodiments will be apparent from the research which forms the Annexe which constitutes the latter part of the present application.

[0063] The essence of the chaotic encryption technique is illustrated in Section 10.5(page x1-x2) of the Annexe which shows the principle of random chaotic number encryption, fractal modulation and there the demodulation plus de-encryption. The vitally important point here is embedded in the innocent little phrase on page x: “A sequence of pseudo-random or chaotic integers (R₀, R₁, . . . R_(N),) . . . ”. Conventional encryption software is based exclusively on the use pseudo-random number generators for which there is a “standard algorithm”. This standardisation is one of the principal reasons why there is an increase in hacking. By a simple replacement of a standard algorithm that generates the encryption field (R₁, R₂, . . . R_(N),) with a chaotic algorithm, a greater level of security can be developed. In addition, by using different classes of chaoticity at different times, the level of security can be increased through what is in effect the introduction of non-stationary chaoticity. This approach uses a chaotic data field R₁ and not a pseudo-random number field. Since there is in principle an unlimited class of chaotic random number generating algorithms this introduces the idea of designing a symmetric encryption system in which the key is a user defined algorithm (together with associated parameters) and an asymmetric system in which the public key is one of a wide range of algorithms operating for a limited period of time and distributed to all users during such a period. In the latter case, the private key is a number that is used to “drive” the algorithm via one or more of the parameters available.

[0064] This approach involves changes to aspects of conventional encryption systems in which the “hard-wired” components, common to most commercial systems, are changed. All interfaces, data structures, etc. can remain the same in such a way that the user would not notice any difference. This aspect is in itself important as it would not flag to users of such a system that any fundamental changes have taken place, thus increasing the level of security associated with the introduction of chaos based encryption.

ANNEX

[0065] Data Encryption and Modulation Using Fractals and Chaos

[0066] Many techniques of coding and cryptography have been developed for protecting the confidentiality of the transmission of information over different media, including telephone lines, mobile radio, satellites and the Internet. In each technique, the purpose of the coding and encryption processes is to improve the reliability, privacy and integrity of the transmitted information. It is imperative that any encryption algorithm is not capable of being “cracked”. In simple terms this means that the possibility of finding out the original plain text from the corresponding cypher text (without knowing the appropriate encryption key) must be so small as to be discounted in practical terms. If this is true for a particular encryption algorithm, then the algorithm is said to be “cryptographically strong”.

[0067] With the rapidly growing use of the Internet for business transaction of all types and e-commerce in general, the design and implementation of cryptographically strong algorithms is becoming more and more important. However, a number of recent events have brought the true meaning of the term “cryptographically strong” into question. The increasing ability for hackers to penetrate sensitive communications systems means that a new generation of encryption software is required. This report, discuss an approach which is based on the use of fractals and chaos.

[0068] One of the principle problems with conventional encryption software is that the “work horse” is still based on a relatively primitive pseudo random number generator using variations on a theme of the linear congruential method. In this work, we consider the use of iterated sequences that lead to chaos and the generation of chaotic random numbers for bit stream coding. Further, we study the use of random fractals for coding bit streams(coded or otherwise) in terms of variations in fractal dimension (Fractal Modulation) such that the digital signal is characteristic of the background noise associated with the medium through which information is to be transmitted. Thisform of data encryption/modulation is of value in the transmission of sensitive information and represents an alternative and potentiallymore versatile approach to scrambling bit streams which has so far not been implemented in any commercial sector.

[0069] This report is in two principal parts; the first part provides a general introduction to cryptography and encryption (Chapters 1-3) and the second part provides background on the random number generators, chaotic processes and fractal signals (Chapters 4-8) used to develop the encryption system discussed in Chapters 9 and 10.

[0070] 1. Introduction

[0071] The need to keep certain messages secret has been appreciated for thousands of years. The advantages gained from intercepting secret information is self-evident, and this has led to a continuous, fascinating battle between the “codemakers”and the “codebreakers”. The arena for this contest is the communications medium which has changed considerably over the years. It was not until the arrival of the telegraph that the art of communications, as we know it today, began. Society is now highly dependent on fast and accurate means of transmitting messages. As well as the long-established forms such as post and courier services, we now have more technical and sophisticated media such as radio, television, telephone, telex, fax, e-mail, videoconferencing and high speed data links. Usually the main aim is merely to transmit a message as quickly and cheaply as possible. However, there are a number of situations where the information is confidential and where an interceptor might be able to benefit immensely from the knowledge gained by monitoring the information circuit. In such situations, the communicants must take steps to conceal and protect the content of their message.

[0072] The purpose of this research monograph, is to provide an overview of an encryption technique based on chaotic random number sequences and fractal coding. We discuss a signal processing technique which enables digital signals to be transmitted confidentially and efficiently over a range of digital communications channels.

[0073] Transmitted information, whether it be derived from speech, visual images or written text, needs in many circumstances to be protected against eavesdropping. Access to the services provided by network operators to enable telecommunications must be protected so that charges for using the services can be properly levied against those that use them. The telecommunications services themselves must be protected against abuse which may deprive the operator of his revenue or undermine the legitimate prosecution of law enforcement.

[0074] The application of random fractal geometry for modelling naturally occurring signals (noise) and visual camouflage is well known. This is due to the fact the statistical and/or spectral characteristics of random fractals are consistent with many objects found in nature; a characteristic which is compounded in the term “statistical self-affinity”. This term refers to random processes which have similar probability density functions at different scales. For example, a random fractal signal is one whose distribution of amplitudes remains the same whatever the scale over which the signal is sampled. Thus, as we zoom into a random fractal signal, although the pattern of amplitude fluctuations will change across the field of view, the distribution of these amplitudes remains the same. Many noises found in nature are statistically self-affine including transmission noise.

[0075] Data Encryption and Camouflage using Fractals and Chaos (DECFC) is a technique whereby binary data is converted into sequences of random fractal signals and then combined in such a way that the final signal is indistinguishable from the background noise a system through which information is transmitted.

[0076] 2. Cryptography

[0077] 2.1 What is Cryptography?

[0078] The word cryptography comes from Greek; kryptos means “hidden” while graphia stands for “writing”. Cryptography is defined as “the science and study of secret writing” and concerns the ways in which communications and data can be encoded to prevent disclosure of their contents through eavesdropping or message interception, using codes, cyphers, and other methods.

[0079] Although the science of cryptography is very old, the desktop computer revolution has made it possible for cryptographic techniques to become widely used and accessible to non experts.

[0080] The history of cryptography can be traced from Ancient Egypt through to the present day. From Julius Caesar to Abraham Lincoln and the American Civil War, cyphers and cryptography has been a part of history.

[0081] During the second world war, the Germans developed the Enigma machine to have secure communications. Enigma codes were decrypted first in Poland in the late 1930s and then under the secret “Ultra Project” based at Bletchly Park in Buckinghamshire (UK) during the early 1940s. This led to a substantial reduction in the level of allied shipping sunk by German U-boats and together the invention of Radar was arguably one of the most important contributions that electronics made to the war effort. In addition, this work contributed significantly to the development of electronic computing after 1945.

[0082] Organisations in both the public and private sectors have become increasingly dependent on electronic data processing. Vast amounts of digital data are now gathered and stored in large computer data bases and transmitted between computers and terminal devices linked together in complex communications networks. Without appropriate safeguards, these data are susceptible to interception (e.g. via wiretaps) during transmission, or they may be physically removed or copied while in storage. This could result in unwanted exposures of data and potential invasions of privacy. Data are also susceptible to unauthorised deletion, modification, or addition during transmission or storage. This can result in illicit access to computing resources and services, falsification of personal data or business records, or the conduct of fraudulent transactions, including increases in credit authorisations, modification of funds transfers, and the issue of unauthorised payments.

[0083] Legislators, recognizing that the confidentiality and integrity of certain data must be protected, have passed laws to help prevent these problems. But laws alone cannot prevent attacks or eliminate threats to data processing systems. Additional steps must be taken to preserve the secrecy and integrity of computer data. Among the security measures that should be considered is cryptography, which embraces methods for rendering data unintelligible to unauthorised parties.

[0084] Cryptography is the only known practical method for protecting information transmitted through communications networks that uses land lines, communications satellites, and microwave facilities. In some instances, it can be the most economical way to protect stored data. Cryptographic procedures can also be used for message authentication, digital signatures and personal identification for authorising electronic funds transfer and credit card transactions.

[0085] 2.2 Cryptanalysis

[0086] The whole point of cryptography is to keep the plaintext (or the key, or both) secret from eavesdroppers (also called adversaries, attackers, interceptors, interlopers, intruders, opponents, or simply the enemy). Eavesdroppers are assumed to have complete access to the communication between the sender and receiver.

[0087] Cryptanalysis is the science of recovering the plaintext of a message without access to the key. Successful cryptanalysis may recover the plaintext or the key. It also may find weaknesses in a cryptographic system that eventually leads to recovery of the plaintext or key. (The loss of a key though non-cryptanalytic means is called a compromise.)

[0088] An attempted cryptanalysis is called an attack. A fundamental assumption in cryptanalysis (first enunciated by the Dutchman A Kerckhoff) assumes that the cryptanalyst has complete details of the cryptographic algorithm and implementation. While real-world cryptanalysts do not always have such detailed information, it is good assumption to make. If others cannot break an algorithm, even with a knowledge of how it works, then they certainly will not be able to break it without that knowledge.

[0089] There a four principal types of cryptanalytic attacks; each of them assumes that the cryptanalyst has complete knowledge of the encryption algorithm used:

[0090] Cyphertext-only Attack

[0091] The cryptanalyst has the cyphertext of several messages, all of which have been encrypted using the same encryption algorithm. The cryptanalyst's job is to recover the plaintext of as many messages as possible, orto deduce the key (or keys) used to encrypt the messages, in order to decrypt other messages encrypted with the same keys.

[0092] Known-plaintext Attack

[0093] The cryptanalyst not only has access to the cyphertext of several messages, but also to the plaintext of those messages. The problem is to deduce the key (or keys) used to encrypt the messages or an algorithm to decrypt any new messages encrypted with the same key (or keys).

[0094] Chosen-plaintext Attack

[0095] The cryptanalyst not only has access to the cyphertext and associated plaintext for several messages, but also chooses the plaintext that gets encrypted. This is more powerful than a known-plaintext attack, because the cryptanalyst can choose specific plaintext blocks to encrypt those that might yield more information about the key. The problem is to deduce the key (or keys) used to encrypt the messages or an algorithm to decrypt any new messages encrypted with the same key (or keys).

[0096] Adaptive-chosen-plaintext Attack

[0097] This is a special case of a chosen-plaintext attack. Not only can the cryptanalyst choose the plaintext that is encrypted, but can also modify the choice based on the results of previous encryption. In a chosen-plaintext attack, a cryptanalyst might just be able to choose one large block of plaintext to be encrypted; in an adaptive-chosen-plaintext attack it is possible to choose a smaller block of plaintext and then choose another based on the results of the first, and so on.

[0098] In addition to the above, there are at least three other types of cryptanalytic attack.

[0099] Chosen-cyphertext Attack

[0100] The cryptanalyst can choose different cypher-texts to be decrypted and has access to the decrypted plaintext. For example, the cryptanalyst has access to a tamperproof box that does automatic decryption. The problem is to deduce the key. This attack is primarily applicable to public-key algorithms. A chosen-cyphertext attack is sometimes effective against a symmetric algorithm as well. (A chosen-plaintext attack and a chosen-cyphertext attack are together known as a chosen-text attack).

[0101] Chosen-key Attack

[0102] This attack does not mean that the cryptanalyst can choose the key; it means that there is some knowledge about the relationship between different keys—it is a rather obscure attack and not very practical.

[0103] Rubber-hose Cryptanalysis

[0104] The cryptanalyst threatens someone until the key is provided. Bribery is sometimes referred to as a purchase-key attack. This is a critical but very powerful attacks and is often the best way to break an algorithm.

[0105] 2.3 Basic Cypher Systems

[0106] Before the development of digital computers, cryptography consisted of character-based algorithms. Different cryptographic algorithms either substituted characters for one another or transposed characters with one another. The better algorithms did both, many times each.

[0107] Although the technology for developing cypher systems is more complex now, the underlying philosophy remains the same. The primary change is that algorithms work on bits instead of characters. This is actually just a change in the alphabet size from 26 elements to 2 elements. Most good cryptographic algorithms still combine elements of substitution and transposition. In this section, an overview of cypher systems is given.

[0108] 2.3.1 Substitution Cyphers (Including Codes)

[0109] As their name suggests, these preserve the order of the plaintext symbols, but disguise them. Each letter or group of letters is replaced by another letter or group to disguise it. In its simplest form, a becomes D, b becomes E, c becomes F etc.

[0110] More complex substitutions can be devised, e.g. a random (or key controlled) mapping of one letter to another. This general system is called a monoalphabetic substitution. They are relatively easy to decode if the statistical properties of natural languages are used. For example, in English, e is the most common letter followed by t, then a etc.

[0111] The cryptanalyst would count the relative occurrences of the letter in the cyphertext, or look for a word that would be expected in the message. To make the encryption more secure, a polyalphabetic cypher may be used, in which a matrix of alphabets is employed to smooth out the frequencies of the cyphertext letters.

[0112] It is in fact possible to construct an unbreakable cypher if the key is longer than the plaintext, although this method, known as a “one time key” has practical disadvantages.

[0113] 2.3.2 Transposition Cyphers

[0114] A common example, the “column transposition cypher” is shown in Table 2.1. Here the Plaintext is: “This is an example of a simple transposition cypher”. The Cyphertext is:

[0115] “almniefheolpnatnepsorimsripdspiathesaatsicixfeocb” TABLE 2.1 Example of Transposition Cypher K E Y W 0 R D 3 2 7 6 4 5 1 t h i s i s a n e x a m p l e o f a s i m p l e t r a n s p o S i t i o n c i p h e r a b c d e f

[0116] The plaintext is ordered in rows under the key which numbers the columns so formed. Column 1 in the example is under the key letter closest to the start of the alphabet. The cyphertext is then read out by columns, starting with the column whose number is the lowest.

[0117] To break such a cypher, the cryptanalyst must guess the length of the keyword, and order of the columns.

[0118] 2.4 Standardised Computer Cryptography

[0119] At present, there are two serious candidates for standardised computer cryptography. The first, which is chiefly represented by the so-called RSA cypher developed a MIT, is a “public key” system which, by its structure, is ideally suited to a society based upon electronic mail. However, in practice it is slow without special-purpose chips which, although under development, do not yet show signs of mass marketing. The second approach is the American Data Encryption Standard (DES) developed at IBM, which features in an increasing number of hardware products that are fast but expensive and not widely available. The DES is also available in software, but it tends to be rather slow, and expected improvements to the algorithm will only make it slower. Neither algorithm is yet suitable for mass communications, and even then, there is always the problem that widespread or constant use of any encryption algorithm increases the likelihood that an opponent will be able to attack it through analysis. Cyphers or individual keys for cyphers for general applications are best used selectively, and this acts against the idea of using cryptographics to guarantee privacy in mass communications.

[0120] The DES and the RSA cyphers represent a sort of branching in the approach tocryptology. Both proceed from the premise that all practical cyphers suitable for mass-market communications are ultimately breakable, but that security can rest in making the scale of work necessary to do it beyond all realistic possibilities. The DES is the resultof work on improving conventional cryptographic algorithms, and as such lies directly in an historical tradition. The RSA cypher, on the other hand, results more from a return to first mathematical principles, and in this sense matches DESs hard-line practicality with established theoretical principles.

[0121] 2.5 The Strength of Security Systems

[0122] In the 1940s, Shannon conducted work in this area, leading to a theory of secrecy systems. His work assumed an attack based on cyphertext only (i.e. no known plaintext). He identified two basic classes of the encryption problem.

[0123] 2.5.1 Unconditionally Secure

[0124] In this case, the cyphertext cannot be cracked even with unlimited computing power. This can only be achieved in practice if a totally random key is used of length equal to or greater than the equivalent plaintext, i.e. the key is never repeated. This infers that all de cypherment values are equally probable.

[0125] 2.5.2 Computationally Secure

[0126] In this case, cryptanalysis is theoretically possible, but impractical due to the enormous amount of computer power required. Modem encryption systems are of this type.

[0127] Shannon's Security Theories were developed from his work on information theory. The analysis of a noisy communications channel is analogous to that of security via data encryption. The noise can be likened to the encyphering operation.

[0128] In information theory, a message M is transmitted over a noisy channel to a receiver. The message becomes corrupted forming M′. The receiver problem is then to reconstruct M from M′. In an encryption system, M corresponds to the plaintext and M′ to the cyphertext. This approach is central to the techniques developed in this report in which the noise is modelled using Random Scaling Fractal Signals.

[0129] 2.5.3 Perfect Secrecy

[0130] The information theoretic properties of cryptographic systems can be decomposed into three classes of information.

[0131] (i) Plaintext messages M occurring with prior probabilities P(M) where ${\sum\limits_{M}{P(M)}} = 1$

[0132] (ii) Cyphertext messages C occurring with probabilities P(C) where ${\sum\limits_{C}{P(C)}} = 1$

[0133] Keys K chosen with prior probabilities P(K) where ${\sum\limits_{K}{P(K)}} = 1$

[0134] Let P_(C)(M) be the probability that message M was sent, given that C was received (thus C is the encryption of message M). Perfect secrecy is defined by the condition

P _(C)(M)=P(M)

[0135] that is, intercepting the cyphertext gives a cryptanalyst no additional information.

[0136] Let P_(M)(C) be the probability of receiving cyphertext C given that M was sent. Then P(C) is the sum of the probabilities P(K) of the keys K that encypher M as C, i.e. ${P_{M}(C)} = {{\sum\limits_{K}{P(K)}} = 1}$

[0137] Usually there is at most one key K such that the cyphertext is equal to the encryption of M and the key K for given M and C. However, some cyphers can transform the same plaintext into the same cyphertext under different keys.

[0138] A necessary and sufficient condition for perfect secrecy is that for every C,

P _(M)(C)=P(C)∀M

[0139] This means that the probability of receiving a particular cyphertext C given that M was sent (encyphered under some key) is the same as the probability of receiving C given that some other message M′ was sent (encyphered under a different key).

[0140] Perfect secrecy is possible using completely random keys at least as long as the messages they encypher. FIG. 1 illustrates a perfect system with four messages, all equally likely, and four keys, also equally likely. Here P_(C)(M)=P(M)=¼ and P_(M)(C)=P(C)=¼ for all M and C. A cryptanalyst intercepting one of the cyphertext messages C₁, C₂, C₃, or C₄ would have no way of determining which of the four keys was used and, therefore, whether the correct message is M₁, M₂, M₃, or M₄.

[0141] Perfect secrecy requires that the number of keys must be at least as great as the number of possible messages. Otherwise there would be some message M such that for a given C, no K decyphers C into M, implying that P_(C)=0. The cryptanalyst could thereby eliminate certain possible plaintext messages from consideration, increasing the chances of breaking the cypher.

[0142] 2.6 Terminology

[0143] It is necessary at this point to define some terminology which is used later in this work and through the field of Cryptography. The following list provides the principal terms associated with cryptography and cryptanalysis.

[0144] Cypher: A method of secret writing such that an algorithm is used to disguise a message. This is not a code.

[0145] Cyphertext: The message after first modification by a cryptographic process.

[0146] Code: A cryptographic process in which a message is disguised by converting it to cyphertext by means of a translation table (or vice-versa).

[0147] Cryptanalyst: The process by which an unauthorised user attempts to obtain the original message from its cyphertext without full knowledge of the encryption systems.

[0148] Cryptology: Includes all aspects of cryptography and cryptanalysis.

[0149] Decypherment or Decryption: The intended process by which cyphertext is transformed to the original message or plaintext.

[0150] Encypherment or Decryption: The process by which plaintext is converted into cyphertext.

[0151] Key: A variable (or string) used to control the encryption or process.

[0152] Plaintext: An original message or data before encryption.

[0153] Private Key: A key value which is kept secret to one user.

[0154] Public Key: A key which is issued to multiple users.

[0155] Session Key: A key which is used only for a limited time.

[0156] Stenanography: The study of secret communication.

[0157] Trapdoor: A feature of a cypher which enables it to be easily broken without the key, but by possessing other knowledge hidden from other users.

[0158] Weak Key: A particular value of a key which under certain circumstances, enables a cypher to be broken.

[0159] Authentication: A mechanism for identifying that a message is genuine, or of identifying an individual user.

[0160] Bijection: A one-to-one mapping of elements of a set {A } to set {B} such that each A maps to a unique B, and each B maps to a unique A.

[0161] Exhaustive Search: Finding a key by checking each possible value.

[0162] Permutation: Changing the order of a set of data elements.

[0163] 2.7 Possible Uses

[0164] Encryption is one of the basic elements of many aspects of computer security. It can underpin many other techniques, by making possible a required separation between sets of data. Some of the more common uses of encryption are outlined below, in alphabetical order rather than in any order of importance.

[0165] Audit Trail

[0166] An audit trail is a file containing a date and time stamped record of PC usage. When produced by a security product, an audit trail is often known as a security journal. An audit trail itemises what the PC was used for, allowing a security manager (controller) to monitor the user's actions.

[0167] An audit trail should always be stored in encrypted form, and be accessible only to authorised personnel.

[0168] Authentication

[0169] This is a mathematical process used to verify the correctness of data. In the case of a message, authentication is used to verify that the message has arrived exactly as it was sent, and that it was sent by the person who claims to have sent it. The process of authentication requires the application of a cryptographically strong encryption algorithm, to the data being authenticated.

[0170] Cryptographic Checksum

[0171] Cryptographic checksums use an encryption algorithm and an encryption key to calculate a checksum for a specified data set.

[0172] Where financial messages are concerned, a cryptographic checksum is often known as a “Message Authentication Code”.

[0173] Digital Signature

[0174] Digital signatures are checksums that depend on the content of a transmitted message, and also on a secret key, which can be checked without knowledge of that secret key (usually by using a public key).

[0175] A digital signature can only have originated from the owner of the secret key corresponding to the public key used to verify the digital signature.

[0176] On-the-fly Encryption

[0177] Also known as background encryption or auto-encryption, on-the-fly encryption means that data is encrypted immediately before it is written to disk, and encrypted after it has been read back from disk. On-the-fly encryption usually takes place transparently.

[0178] The above list should not be thought of as exhaustive. It does, however, illustrate that encryption techniques are fundamental in most areas of data security, as they can provide a barrier around any desired data.

[0179] Given a cryptographically strong encryption algorithm, this barrier can only be breached by possession of the correct encryption key. In short, the success or failure of encryption techniques depends crucially on the successful application of a key management system.

[0180] 3 Encryption

[0181] 3.1 Introduction

[0182] Encryption is the process of disguising information by creating cyphertext which cannot be understood by an unauthorised person. Decryption is the process of transforming cyphertext back into plaintext which can be read by anyone. Encryption is by no means new. Throughout history, from ancient times to the present day, man has used encryption techniques to prevent messages from being read by unauthorised persons. Such methods have until recent years been a monopoly of the military, but the advent of digital computers has brought encryption techniques into use by various civilian organisations.

[0183] Computers carry out encryption by applying an algorithm to each block ofdata that is to be encrypted. An algorithm is simply a set of rules which defines a method of performing a given task. Encryption algorithms would not be much use if they always gave the same cyphertext output for a particular plaintext input. To ensure that this does not happen, every encryption algorithm requires an encryption key. The algorithm uses the encryption key, which is changed at will, as part of the process of encryption. The basic size of each data block that is to be encrypted, and the size of the encryption key has to be precisely specified by every encryption algorithm.

[0184] The whole point of designing an encryption algorithm is to make sure that it cannot be “cracked”. In simple terms, this means that the possibility of finding out the original plaintext from the corresponding cyphertext, without knowing the appropriate encryption key, must be so small as to be discounted in practical terms. If this is true for a particular encryption algorithm, then the algorithm is said to be “cryptographically strong”. Encryption can be used very effectively in protecting data stored on disk, or data transmitted between two PCs, from unauthorised access. Encryption is not a cure-all; it should be applied selectively to information which really does need protecting. After all, the owner of a safe does not keep every single document in the safe; it would soon become full and therefore useless. The penalty paid for overuse of encryption techniques is that throughput and response times are severely affected.

[0185] Since the late 1970s, the mathematics of encryption has developed along two very distinct paths. This followed the invention of public key cryptography, which enabled encryption algorithms where the keys were said to be asymmetric, i.e. the encryption key and the decryption key were no longer required to be the same. This is discussed later.

[0186] 3.1.1 Encryption Notation

[0187] The basic operation of an encryption system is to modify some plaintext (referred to as P) to form some cyphertext (referred to as C) under the control of a key K. The encryption operation is often represented by the symbol E so that we can write

C=E _(K)(P)

[0188] i.e. Cyphertext=Encryption of P under key K.

[0189] The decryption operation, D should restore the plaintext. We can write

P=D _(K)(C)

[0190] A general model for a cryptographic system may now be drawn as illustrated in FIG. 2.

[0191] This model also shows the communication of the cyphertext from transmitter (encryption) to receiver (decryption) and the possible actions of an intruder or cryptanalyst. The intruder may be passive, and simply record the cyphertext being transmitted or active. In this latter case, the cyphertext may be changed as it is transmitted, or new cyphertext inserted.

[0192] 3.1.2 Symmetric Algorithms

[0193] By definition, a symmetric encryption algorithm is one where the same encryption key is required for encryption and decryption. This definition covers most encryption algorithms used through history until the advent of public key cryptography. When a symmetric algorithm is applied, if decryption is carried out using an incorrect encryption key, then the result is usually meaningless.

[0194] The rules which define a symmetric algorithm contain a definition of what sort of encryption key is required, and what size of data block is encrypted for each execution of the encryption algorithm. For example, in the case of the DES encryption algorithm, the encryption key is always 56 bits, and each data block is 64 bits long.

[0195] Symmetric encryption (FIG. 3) takes an encryption key and a plaintext datablock, and applies the encryption algorithm to these to produce a cyphertext block.

[0196] Symmetric decryption (FIG. 4) takes a cyphertext block, and the key used for encryption, and applies the inverse of the encryption algorithm to recreate the original plaintext data block.

[0197] 3.1.3 Asymmetric Algorithms

[0198] An asymmetric encryption algorithm requires a pair of keys, one for encryption and one for decryption. The encryption key is published, and is freely available for anyone to use. The decryption key is kept secret. This means that anyone can use the encryption key to perform encryption, but decryption can only be performed by the holder of the decryption key. Note that the encryption key really can be “published” in the true sense of the word, there is no need to keep the value of the encryption key secret. This is the origin of the phrase “public key cryptography” for this type of encryption system; the key used to perform encryption really is a “public” key.

[0199] One clear advantage of an asymmetric encryption algorithm over a conventional symmetric encryption algorithm is that when asymmetric encryption is used to protect information transmitted between two sites, the same key does not need to be present at both sites. This presents a clear advantage when key management is being considered. Asymmetric encryption takes an encryption key and a plaintext datablock, and applies the encryption algorithm to these to produce a cyphertext block. Asymmetric decryption takes a cyphertext block, and the key used for decryption, and applies the decryption algorithm to these two to recreate the original plaintext data block.

[0200] 3.1.4 Choice of Algorithm

[0201] When the decision to use encryption for some purpose has been taken, the choice of which particular encryption algorithm to use must then be made. Unless one has a technical knowledge of cryptography, and access to technical details of the encryption algorithm in question, one golden rule applies: if at all possible stick to published, well tested, encryption algorithms. This is not to say that unpublished encryptionalgorithms are cryptographically weak, only that without access to published details of how an encryption algorithm works, it is very difficult for anyone other than the original designer(s) of the algorithm to have any idea of its strength.

[0202] A major problem with encryption systems is that with two exceptions (see below), manufacturers tend to keep the encryption algorithm a heavily guarded secret. As a purchaser, how does one know whether the encryption algorithm is any good? In general, it is not possible to establish the quality of an algorithm and the purchaser is therefore forced to take a gamble and trust the manufacturer. No manufacturer is ever going to admit that their product uses an encryption algorithm that is inferior; such information is only ever obtained by those specifically investigating the algorithm/product for weaknesses.

[0203] One argument that is in favour of secret encryption algorithms is that the very secrecy of the algorithms adds to the “security” offered by it. Although this may be true, and is put forward almost universally by government users of encryption, such advantages are usually ephemeral. Government users have the resources to ensure that an encryption algorithm is thoroughly studied, and can insist upon being provided with details of how the encryption algorithm works (in confidence). They do not suffer from using poor encryption algorithms which hide their weaknesses behind a veil of secrecy, as they make sure that their encryption algorithms are unpublished, but extensively studied. For commercial usage, the best test of an algorithms strength is probably the fact that details of the encryption algorithm have been published, extensively scrutinised by mathematicians and cryptographers, and no compromising attacks have been published as a result.

[0204] All unpublished proprietary algorithms are weak to a greater or lesser degree. The important question is, how weak? Unless there is access to technical cryptographic competence, and a helpful supplier of encryption products, the only real solution is to use an algorithm for which all the relevant details have been published. There are possibly only two encryption algorithms for which this has been done that remained cryptographically strong after publication and the consequent intense security. These are the asymmetric RSA public key algorithm, and the symmetric DES algorithm. RSA is primarily used for key management whilst the DES algorithm is routinely used in the financial world.

[0205] If a proprietary encryption algorithm is used which is offered by many manufacturers, then the user is at the mercy of the designer of the algorithm. No matter what the specifications, there is no sample way to prove that an encryption algorithm is cryptographically strong. The converse, however, is not true. Any design fault in an encryption algorithm can reduce the algorithm to the point at which it is trivial to compromise. In general, it is not possible to establish whether an unpublished encryption algorithm is cryptographically strong, but it may be possible to establish (the hard way) that it is terminally weak! Unpublished proprietary encryption algorithms are often used as a means of speed the encryption process whilst still appearing to remain secure. If the details of all unpublished encryption algorithms were available publicly, it would probably reveal a whole spectrum of algorithm strength—from the sublime to the ridiculous. Without such details much has to be taken on trust.

[0206] 3.2 Encryption Keys: Private and Public

[0207] Complex cyphers use a secret key to control a long sequence of complicated situations and transpositions. Substitution cyphers replace the actual bits, characters, or blocks of characters with substitutes e.g. one letter replaces another letter. Julius Caesar's military use of such a cypher was the first clearly documented case. In Caesar's cypher each letter of an original message is replaced with the letter three places beyond it in the alphabet. Transposition cyphers rearrange the order of the bits, characters, or blocks of characters that are being encrypted and decrypted. There are two general categories of cryptographic keys: Private key and Public key systems.

[0208] Private key systems use a single key. The single key is used both to encrypt and decrypt the information. Both sides of the transmission need a separate key and the key must be kept secret. The security of the transmission will depend on how well the key is protected. The US Government developed the Data Encryption Standard (DES) which operates on this basis and it is the actual US standard. DES keys are 56 bits long and this means that there are 72 quadrillion different possible keys. The length of the key has been criticised and it has been suggested that the DES key was designed to be long enough to frustrate corporate eavesdroppers, but short enough to be broken by the National Security Agency.

[0209] Export of DES is controlled by the US State Department. The DES system is becoming insecure because of its key length. The US government has offered to replace the DES with a new algorithm called Skipjack which involves escorted encryption. The technology is based on a tamper-resistant hardware chip (the Clipper Chip) that implements an NSA designed encryption algorithm called Skipjack, together with a method that allows all communications encrypted with the chip (regardless of what session key is used or how it is selected) to be decrypted through a special chip, unique key and a special Law Enforcement Access Field transmitted with the encrypted communications.

[0210] In the public key system, there are two keys: a public and a private key. Each user has both keys, and while the private key must be kept secret, the public key is publicly known. Both keys are mathematically related. If A encrypts a message with a private key, then B the recipient of the message, can decrypt it with A's public key. Similarly, anyone who knows A's public key can send a message by encrypting it with the public key. A will then decrypt it with the private key. Public key cryptography was developed in 1977 by Rivest, Shamir and Adleman (RSA) in the US. This kind of cryptography is more efficient than the private key cryptography because each user has only one key to encrypt and decrypt all the messages that are received. Pretty Good Privacy (PGP), an encryption software for electronic communications written by Philip R Zimmerman, is an example of public key cryptography.

[0211] 3.2.1 Key Generation

[0212] An encryption key should be chosen at random from a very large number of possibilities. If the number of possible keys is small, then any potential attacker can simply try all possible encryption keys before stumbling across the correct one. If the choice of encryption key is not random, then the sequence used to choose the key could itself be used to guess which key is in use at any particular time.

[0213] The length of the key required is always set by the particular encryption algorithm in use. Thus key generation requires the production of a sequence of random bits of some stated length. This gives rise to a problem. All random number generators that operate entirely in software, with no external influence, are only pseudo random. They are mere sequence generators, but the sequence can of course be of very great length. The only way to generate truly random numbers is to use external hardware, or external stimuli, which go beyond the confines of a strictly software random number generator. The designers of hardware equipment go to great lengths to incorporate random bit generators which use random electrical noise as the source of random bits. However, this is expensive and difficult to design with any degree of reliability. For software encryption packages, the option of special hardware is not available. The best compromise is a long sequence, random number generator, with access to a time of day clock included to add an extra element of randomness.

[0214] Ideally, key generation should always be random—which precludes inventing an encryption key, and entering it at the keyboard. Humans are very bad at inventing random sets of characters, because patterns in character sequences make it much easier for them to remember the encryption key. The worst option of all for key generation is to allow keys to be invented by a user as words, phrases or numbers. This should be avoided if at all possible.

[0215] If an encryption system of any kind requires the encryption key to be entered by the user, and offers no possibility of using encryption keys which are random, it should not be treated seriously. It is often necessary to have the facility to be able to enter a known encryption key in order to communicate with some other system that provided the encryption key. However, this key should itself be randomly generated.

[0216] Key generation should under no circumstances be treated lightly. Key management and the design of cryptographically strong encryption algorithms it is one of the truly vital components of any encryption scheme. In this work, we investigate the use of keys using chaos generators rather than pseudo-random number generators.

[0217] 3.2.2 Key Management

[0218] Once an encryption key has been generated, how it is managed then becomes of paramount importance. Key management comprises choosing, distributing, changing, and synchronizing encryption keys. Key generation can be thought of as similar to choosing the combination for the lock on a safe. Key management is making sure that the combination is not disclosed to any unauthorised person. Encryption offers no protection whatsoever if the relevant key(s) become known to an unauthorised person, and under such circumstances may even induce a false sense of security.

[0219] To facilitate secure key management, encryption keys are usually formed intoa key management hierarchy. Encryption keys are distributed only after they have themselves been encrypted by another encryption key, known as a “key encrypting key”, which is only ever used to encrypt other keys for the purposes of transportation or storage. It is never used to encrypt data. At the bottom of a key management hierarchy are data encrypting keys. This is a term used for an encryption key which is only ever used to encrypt data (not other keys). At the top of a key management hierarchy is an encryption key known as the master key. The only constraints on the number of distinct levels involved in a key management hierarchy are practical ones, but it is rare to come across a key management hierarchy with more than three distinct levels.

[0220] It should be appreciated that if there was a secure way to transmit a master key from one site to another, without humans being involved in the process, then that method would itself be used for the transmission of encrypted data. The master key would then not be required. Therefore, such a method does not exist, and cannot ever exist. No matter how complex a key management hierarchy is, the master key must always be kept secret by human means. This requires trusted personnel, and manual entry of the master key, which should be split into two or more components to help preserve its integrity. Each component of the master key is known only to one person, and all components must be individually entered before they are recombined to form the complete master key. Such a system cannot be compromised unless all the personnel involved are compromised, as any individual component of the master key is useless by itself.

[0221] Once an encryption key has itself been encrypted by a “key encrypting key” from a higher level in the key management hierarchy, then it can be transmitted or stored with impunity. There is no requirement to keep such encrypted keys secret. Keys that have been encrypted in this manner are typically written on to a floppy disk for storage, transmitted across networks, stored on EPROM or EEPROM, or written to magnetic strips cards. A key management hierarchy makes the security of the actual medium used for transmission or storage of encrypted keys completely irrelevant. There is no point in setting up an encryption system, and then executing the key management in a sloppy insecure way. Doing nothing is preferable.

[0222] 3.3 Super Encypherment

[0223] The encypherment process used during key management can be strengthened by using triple encypherment. Two encryption keys are required for this process, which has the same effect, in cryptographic strength terms, as using a double length encryption key, each single encypherment is replaced by the following process: (i) encypher with key #1; (ii) decypher with key #2; (iii) encypher with key #1. Decryption is similarly achieved using: (i) decypher with key #1; (ii) encypher with key #2; decypher with key #1.

[0224] Other more complicated methods of super encypherment are possible; all of them involve increasing the number of calls to the basic encryption algorithm. The time required for an encryption is linear with the number of keys used, but the strength is exponential with key length. Hence doubling the key length has an enormous effect on the cryptographic strength of an encryption algorithm.

[0225] 3.4 Encrypting Communications Channels

[0226] In theory, this encryption can take place at any layer in the Open Systems Interface (OSI) communications model. In practice, it takes place either at the lowest layers (one or two) or at higher layers. If it takes place at the lowest layers, it is called link-by-link encryption; everything going through a particular data link is encrypted. If it takes place at higher layers, it is called end-to-end encryption; the data are encrypted selectively and stay encrypted until they are decrypted by the intended final recipient. Each approach has its own benefits and drawbacks.

[0227] 3.4.1 Link-by-Link Encryption

[0228] The easiest place to add encryption is at the physical layer. This is called link-by-link encryption. The interfaces to the physical layer are generally standardised and it is easy to connect hardware encryption devices at this point. These devices encrypt all data passing through them, including data, routing information, and protocol information. They can be used on any type of digital communication link. On the other hand, any intelligent switching or storing nodes between the sender and the receiver need to decrypt the data stream before processing it.

[0229] This type of encryption is very effective because everything is encrypted. A cryptanalyst can get no information about the structure of the information. There is no idea of who is talking to whom, the length of the messages they are sending are, what times of the day they communicate, and so on. This is called traffic-flow security: the enemy is not only denied access to the information, but also access to the knowledge of where and how much information is flowing.

[0230] Security does not depend on any traffic management techniques. Key management is also simple, only the two endpoints of the line need a common key, and they can change their key independently from the rest of the network.

[0231] Imagine a synchronous communications line, encrypted using 1-bit CFB. After initialization, the line can run indefinitely, recovering automatically from bit or synchronisataion errors. The line encrypts whenever messages are sent from one end to the other, otherwise it just encrypts and decrypts random data. There is no information on when messages are being sent and when they are not; there is no information on when messages begin and end. All that is observed is an endless stream of random-looking bits.

[0232] If the communications line is asynchronous, the same 1-bit CFB mode can be used. The difference is that the adversary can get information about the rate of transmission. If this information must be concealed, then some provision for passing dummy messages during idle times is required.

[0233] The biggest problem with encryption at the physical layer is that each physical link in the network needs to be encrypted; leaving any link unencrypted jeopardises the security of the entire network. If the network is large, the cost may quickly become prohibitive for this kind of encryption.

[0234] Additionally, every node in the network must be protected, since it processes unencrypted data. If all the network's users trust one another, and all nodes are in secure locations, this may be tolerable. But this is unlikely. Even in a single corporation, information might have to be kept secret within a department. If the network accidentally misroutes information, anyone can read it.

[0235] 3.4.2 End-to-End Encryption

[0236] Another approach is to put encryption equipment between the network layer and the transport layer. The encryption device must understand the data according to the protocols up to layer three and encrypt only the transport data units, which are then recombined with the unencrypted routing information and sent to lower layers for transmission.

[0237] This approach avoids the encryption/decryption problem at the physical layer. By providing end-to-end encryption, the data remains encrypted until it reaches its final destination. The primary problem with end-to-end encryption is that the routing information for the data is not encrypted; a good cryptanalyst can learn much from who is talking to whom, at what times and for how long, without ever knowing the contents of those conversations. Key management is also more difficult, since individual users must make sure they have common keys.

[0238] Building end-to-end encryption equipment is difficult. Each particular communications system has its own protocols. Sometimes the interfaces between the levels are not well-defined, making the task even more difficult.

[0239] If encryption takes place at a high layer of the communications architecture, like the applications layer or the presentation layer, then it can be independent of the type of communication network used. It is still end-to-end encryption, but the encryption implementation does not have to be bothered about line codes, synchronisataion between modems, physical interfaces, and so forth. In the early days of electromechanical cryptography, encryption and decryption took place entirely off-line, this is only one step removed from that.

[0240] Encryption at these high layers interacts with the user software. This software is different for different computer architectures, and so the encryption must be optimised for different computer systems. Encryption can occur in the software itself or in specialised hardware. In the latter case, the computer will send the data to the specialised hardware for encryption before sending it to lower layers of the communication architecture for transmission. This process requires some intelligence and is not suitable for dumb terminals. Additionally, there may be compatibility problems with different types of computers.

[0241] The major disadvantage of end-to-end encryption is that it allows traffic analysis. Traffic analysis is the analysis of encrypted messages: where they come from, where they go to, how long they are, when they are sent, how frequent or infrequent they are, whether they coincide with outside events like meetings, and more. A lot of good information is buried in this data, and is therefore important to a cryptanalyst.

[0242] 3.4.3 Combining the Two

[0243] Combining the two, whilst most expensive, is the most effective way of securing a network. Encryption of each physical link makes any analysis of the routing information impossible, while end-to-end encryption reduces the threat of unencrypted data at the various nodes in the network. Key management for the two schemes can be completely separate. The network managers can take care of encryption at the physical level, while the individual users have responsibility for end-to-end encryption.

[0244] 3.5 Hardware Encryption Versus Software Encryption

[0245] 3.5.1 Hardware

[0246] Until very recently, all encryption products were in the form of specialised hardware. These encryption/decryption boxes plugged into a communications line and encrypted all the data going across the line. Although software encryption is becoming more prevalent today, hardware is still the embodiment of choice for military and serious commercial applications. The NSA, for example, only authorises encryption in hardware. There are a number of reasons why this is so. The first is speed. The two most common encryption algorithms, DES and RSA, run inefficiently on general-purpose processors. While some cryptographers have tried to make their algorithms more suitable for software implementation, specialised hardware will always win a speed race. Additionally, encryption is often a computation-intensive task. Tying up the computer's primary processor for this is inefficient. Moving encryption to another chip, even if that chip is just another processor, makes the whole system faster.

[0247] The second reason is security. An encryption algorithm running on a generalised computer has no physical protection. Hardware encryption devices can be security encapsulated to prevent this. Tamperproof boxes can prevent someone from modifying a hardware encryption device. Special-purpose VLSI chips can be coated with a chemical such that any attempt to access their interior will result in the destruction of the chip's logic.

[0248] The final reason for the prevalence of hardware is the ease of installation. Most encryption applications do not involve general-purpose computers. People may wish to encrypt their telephone conversations, facsimile transmissions, or data links. It is cheaper to put special-purpose encryption hardware in telephones, facsimile machines, and modems than it is to put in a microprocessor and software.

[0249] The three basic kinds of encryption hardware on the market today are: self-contained encryption modules (that perform functions such as password verification and key management for banks), dedicated encryption boxes for communications links and boards that plug into personal computers.

[0250] More companies are starting to put encryption hardware into their communications equipment. Secure telephones, facsimile machines, and modems are all available.

[0251] Internal key management for these devices is generally secure, although there are as many different schemes as there are equipment vendors. Some schemes are more suited for one situation than another and buyers should know what kind of key management is incorporated into the encryption box and what they are expected to provide themselves.

[0252] 3.5.2 Software

[0253] Any encryption algorithm can be implemented in software. The disadvantages are in speed, cost and ease of modification (or manipulation). The advantages are in flexibility and portability, ease of use, and ease of upgrade. Software based algorithms can be inexpensively copied and installed on many machines. They can be incorporated into larger applications, such as communication programs and, if written in a portable language such as C/C++, can be used and modified by a wide community.

[0254] Software encryption programs are popular and are available for all major operating systems. These are meant to protect individual files; the user generally has to manually encrypt and decrypt specific files. It is important that the key management scheme be secure. The keys should not be stored on disk anywhere (or even written to a place in memory from where the processor swaps out to disk). Keys and unencrypted files should be erased after encryption. Many programs are sloppy in this regard, and a user has to choose carefully.

[0255] A local programmer can always replace a software encryption algorithm with something of lower quality. But for most users, this is not a problem. If a local employee can break into the office and modify an encryption program, then it is also possible for that individual to set up a hidden camera on the wall, a wiretap on the telephone, and a TEMPEST detector along the street. If an individual of this type is more powerful than the user, then the user has lost the game before it starts.

[0256] 3.6 Software Encryption Products

[0257] This topic attempts to place the data encryption techniques described in this report in its proper context amongst the many other security products that are currently available for the PC. It should in no way be thought of as an attempt to cover the whole range of products that are available. This is done very effectively by the many “Security Product Guides” that are published annually. Similarly, only a few commonly used products are described. All of the products discussed below are readily available.

[0258] 3.6.1 Symmetric Algorithm Products

[0259] The following software packages use a symmetric encryption algorithm. They often offer encryption as just one of many other security features.

[0260] Datasafe is a memory-resident encryption utility, supplied on a copy protected disk. It intercepts DOS system calls, and applies encryption using a proprietary key unique to each copy of Datasafe. Using a different password for each file ensures unique encryption. Datasafe detects whether a file is encrypted, and can distinguish an encrypted file from a plaintext file. On-the-fly encryption is normally performed using a proprietary algorithm, but DES encryption is available using a stand-alone program.

[0261] Decrypt is a DES implementation for the 8086/8088 microprocessor family (as used in early PCs). Decrypt is designed to be easy to integrate into many types of program and specified hardware devices, such as hardware encryptors and point of sale terminals.

[0262] Diskguard is a software package which provides data encryption using the DES algorithm. One part of Diskguard is memory-resident, and may be accessed by an application program. This permits encryption of files, and/or blocks of memory. The second part of Diskguard accesses the memory-resident part through a menu-driven program. Each file is protected by a different key, which is in turn protected by its own password. Electronic Code Book and cypher Feedback modes of encryption can be used.

[0263] File-Guard is a file encryption program which uses a proprietary algorithm. File-Guard encrypts files and/or labels them as “Hidden”. Files which are marked as hidden do not appear in a directory listing.

[0264] Fly uses a proprietary algorithm, and an 8-character encryption key, to encrypt a specified file. The original file is always overwritten, therefore, once encryption is complete, no plaintext data from the original file remains on the disk. Overwriting the original plaintext could have interesting consequences if the PC experienced a power cut during the encryption process.

[0265] N-Code is a menu driven encryption utility for the MS-DOS operating system which uses a proprietary algorithm. Each encryption key can be up to 20 alphanumeric characters long, and is selected by the N-Code user. Access to the encryption functions provided by N-Code is password protected. A user can choose to encrypt just one file, many files within a subdirectory, or an entire disk subdirectory. The original plaintext file can either be left intact, or over-written by the encrypted data.

[0266] P/C Privacy is a file encryption utility available for a large number of operating systems ranging from MS-DOS on a PC, to VMS on a DES system, and/or MVS on a large IBM mainframe. P/C Privacy uses a proprietary encryption algorithm, and each individual encryption key can be up to 100 characters long. Every encrypted file is constrained to printable characters only. This helps to avoid many of the problems encountered during transmission of an encrypted file via modems and/or networks. This technique also increases the encrypted file size to roughly twice the size of the original plaintext file.

[0267] Privacy Plus is a software files encryption system capable of encrypting any type of file stored on any type of disk. Encryption is carried out using either the DES encryption algorithm, or a proprietary algorithm. Privacy Plus can be operated from batch files or can be menu driven. Memory-resident operation is possible if desired. Encrypted files can be hidden to prevent them appearing in a directory listing. An option is available which permits the security manager to unlock a user's files if the password has been forgotten, or the user has left the company. Note that this means that the encryption key, or a pointer to the correct encryption key, must be stored within every encrypted file. An option is also available which imposes multi-level security on top of Privacy Plus.

[0268] SecretDisk provides on-the-fly encryption of files stored in a specially prepared area of a disk. It works by constructing a hidden file on the disk (hard or floppy), and providing the necessary device drivers to persuade MS-DOS that this is a new drive. All files on a Secret Disk are encrypted using an encryption key formed from a password entered by the user. No key management is implemented, the password is simply committed to memory. If this password is forgotten, then there is no way to retrieve the encrypted data. Also included with Secret Disk is a DES file encryption utility, but again with no key management facilities. With a Secret Disk initialised, a choice must be made between using a proprietary encryption algorithm, and the DES algorithm. This choice affects the performance of Secret Disk drastically as the DES version of Secret Disk is about 50 times slower than the proprietary algorithm.

[0269] Ultralock encrypts data stored in a disk file. It resides in memory, capturing and processing file requests to ensure that all files contained within a particular file specification are encrypted when stored on disk. For example, the specification “B:MY*.TXT” encrypts all files created on drive B whose filename begins with “MY” that have an extension of “TXT”. Overlapping specifications can be given, and Ultralock will derive the correct encryption key. A user has the power to choose which files are encrypted, therefore, Ultralock encryption is discretionary in nature, not mandatory. The key specification process is extremely flexible, and allows very complex partitions between various types of files to be achieved. Ultralock uses its own, unpublished, proprietary encryption algorithm.

[0270] VSF-2 is a multi level data security system for the MS-DOS operating system. VSF-2 encrypts files on either a hard disk or floppy disk. A positive file erasure facility is included. The user must choose the file to be secured, and the appropriate security level (1 to 3). At level 1, the file is encrypted but still visible in a directory listing. Level 2 operation encrypts the file, but also makes the encrypted result a hidden file. Level 3 operation ensures that the file is erased if three unsuccessful decryption attempts are made.

[0271] There are many software encryption products available, and it should be obvious from the above list that a great number of them offer encryption using a proprietary (unpublished) algorithm. This must be approached with caution, as is discussed in depth at various places throughout this report. Over half of the products offer DES encryption, often as an adjunct to the “fast” proprietary algorithm. The promotional literature tends to imply that a user will be far better off using the proprietary algorithm as it executes far faster than the DES algorithm. This may be true, and it tends to make many of the products that offer on-the-fly encryption bearable; but at what expense?

[0272] Only two of the products discussed above offer key management facilities. This is a low percentage of the total number of products. Most of the software packages rely on the user entering the encryption key at runtime, rather like a password. In fact, many of them inextricably confuse the concepts of encryption key and password. Some products even manage to confuse the concepts of encryption key and encryption algorithm, by discussing variable algorithms. Key management is crucial. Humans are very poor at remembering encryption keys, and even worse at keeping an encryption key secret.

[0273] It is possible to obtain a software package offering just about any desired combination of features. Therefore, it is vitally important to analyse the reasons behind making the decision to use encryption. If these reasons are not clear, then the danger of purchasing an unsuitable product is increased. Products which provide encryption in software have one major advantage over all the other products discussed in the following sections—price. They are often an order of magnitude cheaper than the equivalent hardware product.

[0274] 3.6.2 Asymmetric Algorithm Products

[0275] The following software packages use an asymmetric encryption algorithm. They often offer encryption as one of many security features.

[0276] Crypt Master is a software security package which uses the RSV public key encryption algorithm with a modulus length of 384 bits. Crypt Master can provide file encryption and/or digital signatures for any type of file. The RSA algorithm can be used as a key management system to transport encryption keys for a symmetric, proprietary encryption algorithm. This symmetric algorithm is then used for bulk file encryption. Digital signatures are provided using the RSA algorithm.

[0277] Public is a software package which uses the RSA public key encryption algorithm (with a modulus length of 512 bits) to secure transmitted messages. Encryption is used to prevent message inspection. The RSA algorithm is used to securely transport encryption keys for either the DES algorithm, or a proprietary encryption algorithm—one of which is used to encrypt the content of a specified file. Digital signatures are used to prevent message alteration. The asymmetry of the RSA algorithm permits a digital signature to be calculated with a secret RSA key which can be checked using the corresponding public RSA key. In a hierarchical menu system, public key management facilities and key generation software are all included.

[0278] MailSafe is a software package which uses the RSA public key encryption algorithm to encrypt and/or authenticate transmitted data. Key generation facilities are included, and once a pair of RSA keys have been generated, they can be used to design and/or encrypt files. Signing a file appends an RSA digital signature to the original data. This signature can be checked at any time. Utilities are available which offer data compression, management of RSA keys, and connections to electronic mail systems.

[0279] Unlike the products which offer encryption using a symmetric encryption algorithm, the above products are all that seem to be currently available which offer RSA encryption as a software package (the symmetric encryption products were selected from a long list). All but one of the products offer digital signature facilities as well as RSA encryption and decryption. Key management problems change their nature when public key algorithms are used. The basic problem becomes one of guaranteeing that a received public key is authentic. Given the complex (and slow) mathematics required to generate a public/secret key pair, and the slow encryption speed, these RSA software packages are often used to transfer keys for a symmetric encryption algorithm in a secure manner. Some of the packages even have in-built symmetric encryption facilities. The price advantage enjoyed by software packages which use a symmetric encryption algorithm does not spill over into products using RSA. They tend to be highly priced, sometimes almost as much as the products discussed below, which include special purpose hardware. This is merely a reflection of the size of the market for RSA products. This in itself is a reflection of the speed of encryption. Software based RSA is not suitable for slow PC's.

[0280] 3.6.3 Location of Encryption

[0281] As with most PC products, software solutions are almost universally cheaper than the equivalent hardware products. When data held on disk is to be protected by encryption, it is always difficult to decide the level at which to operate. Too high in the DOS hierarchy, and the encryption has difficulty in copying with the multitude of ways in which applications can use DOS. Too low in the DOS hierarchy and key management becomes difficult, as the link between a file name and its associated data may be lost in track/sector formatting. Various solutions are possible:

[0282] (i) Treat encryption as a DOS application and let the user add encryption. This is how the encryption utility programs operate.

[0283] (ii) Try to process every DOS function call; this is how the on-the-fly encryption utilities work.

[0284] (iii) Impose encryption at the level of disk access, but remain high enough to permit encryption to be selected on the basis of the MS-DOS filenames. Ultralock seems to be somewhat unique in that it succeeds in existing at this level. It imposes encryption on the basis of file names (and/or extensions) whilst residing in memory. The penalty is that versions of Ultralock are specific to particular versions (or range of versions) of MS-DOS.

[0285] In reality the choice is usually between a proprietary algorithm for on-the-fly encryption, and either DES or RSA for secure encryption on a specific file-by-file basis. It is not advisable to invest in encryption packages which use a secret encryption algorithm (often called a proprietary algorithm), unless there is complete confidence in the company that designed the product. This confidence should be based on the designer of the product and not the salesman.

[0286] 4 Data Compression

[0287] 4.1 Introduction

[0288] The benefits of data compression have always been obvious. If a message can be compressed n times, it can be transmitted in 1/n of the time, or transmitted at the same speed through a channel with 1/n of the bandwidth. It can also be stored in 1/n of the volume of the original. A typical page of text that has been scanned requires megabytes, instead of kilobytes, of storage. For example, an 8.5 times 11 inch (U.S. standard letter size) page scanned at 600 times 600 dpi requires about 35 MB of storage at 8 bits per pixel—three orders of magnitude more than a page of ASCII text. Fortunately, most pages of text have significant redundancy, and a pixel map of these pages can be processed in order to store the page in less memory than the raw pixel map. This process of eliminating the redundancy in order to save storage space is called compression, or often data encoding. The success of the compression operation often depends on the amount of processing power that can be applied. The result of the compression is measured by the compression ratio (CR), which is defined as the ratio of the number of bits in the data before compression to the number of bits after compression. Although the storage cost per bit is about half a millionth of a dollar, a family album with several hundred photos can cost more than a thousand dollars to store! This is one area where image compression can play an important role. Storing images with less memory cuts cost. Another useful feature of image compression is the rapid transmission of data; fewer data requires less time to send, So how can data be compressed? Mostly, data contain some amount of redundancy that can sometimes be removed when the data is stored, and replaced when it is restored. However, eliminating this redundancy does not necessarily lead to high compression. Fortunately, the human eye is insensitive to a wide variety of information loss. That is, an image can be changed in many ways that are either not detected by the human eye or do not contribute to “degradation” of the image. If these changes lead to highly redundant data, then the data can be greatly compressed when the redundancy can be detected. For example, the sequence 2, 0, 0, 2, 0, 2, 2, 0, 0, 2, 0, 2, . . . is (in some sense) similar to 1, 1, 1, 1, 1 . . . , with random fluctuations of ±1. If the latter sequence can serve our purpose as well as the first, we would benefit from storing it in place of the first, since it can be specified very compactly. How much can a document be compressed? This depends on several factors that can only be approximated, even if we answer the following questions. What type of document is it—text, line art, gray-scale, or halftone? What is the complexity of the document? What sampling resolution was used to scan the input page? What computing resources are we willing to devote to the task? How long can we afford to process the image? Which compression algorithm are we going to use? Usually we can only estimate the achievable CR, based on the results of similar sets of documents under similar conditions. Compression ratios in the range of 0.5 to 200 are typical, depending on the above factors. (A CR less than 1.0 means that the algorithm has expanded the image instead of compressing it. This is common in the compression of halftone images.) The CR is a key parameter, since transmission time and storage space scale with its inverse. In some cases, images can be processed in the compressed domain, which means that the processing time also scales with the inverse of the CR.Compression is extremely important in document image processing because of thesize of scanned images.

[0289] 4.1.1 Information Theory

[0290] Messages are transmitted in order to transfer information. Most messages have a certain amount of redundancy in addition to their information. Compression is achieved by reducing the amount of redundancy in a message while retaining all or most of its information. What is information? A binary communication must have some level of uncertainty in order to communicate information. Similarly, with an electronic image of a document, large areas of the same shade of gray do not convey information. These areas are redundant and can be compressed. A text document, for example, usually contains at least 95\% white space and can be compressed effectively. The various types of redundancies that can occur in documents are as follows: (i) sparse coverage of a document; (ii) repetitive scan lines; (iii) large smooth gray areas; (iv) large smooth halftone areas; (v) ASCII code, always 8 bits per character; (vi) double characters; (vii) long words, frequently used.

[0291] Entropy

[0292] Entropy E is a quantitative term for the amount of information in a string of symbols and is given by the following expression $E = {- {\sum\limits_{i = 1}^{N}\quad {P_{i}\log_{2}P_{i}}}}$

[0293] where P_(i) is the probability of occurrence of each one of N independently occurring symbols. As an example, if we have a binary image with equal random probabilities of black and white pixels of 0.5 say, then the entropy is E=−[0.5×(−1.0)]−[0.5×(−1.0)]=1.0 bit of information per bit transmitted. On the other hand, if the probability of blackis 0.05 and the probability of white is0.95, the Entropy is equal to 0.22+0.07=0.29 bit per bit. As the probability of a block binary bit changes from 0.0 to 1.0, the total entropy varies from 0.0 to a peak of 1.0 and back to a value of 0.0 again. A basic ground rule of compression systems is that more frequent messages should be shorter, while less frequent messages can be longer.

[0294] 4.2 Binary Data Compression

[0295] Most binary compression schemes are information-preserving, so that when binary data is compressed and then expanded, it will be exactly the same as the original, assuming that no errors have occurred. Gray-scale compression schemes, on the other hand, are often non-information-preserving, or lossy. Some “unimportant” information is discarded, so that better compression results can be achieved. If more information is discarded, a higher CR will result, but a point will be reached where the decompressed image will have a degraded appearance. Some of the techniques used in binary compression systems are listed below. Often several of these techniques are used in one compression system: (i) packing; (ii) run length coding; (iii) huffman coding; (iv) arithmetic coding; (v) predictive coding; (vi) READ coding; (vii) JPEG compression.

[0296] 4.2.1 Runlength Coding

[0297] Run-length coding replaces a sequence of the same character by a shorter sequence which contains a numeric that indicates the number of characters in the original sequence. The actual method by which run-length coding is affected can vary, although the operational result is essentially the same. For example, consider the sequence ********, which might represent a portion of a heading. Here the sequence of eight asterisks can be replaced by a shorter sequence, such as Sc*8, where Sc represents a special compression-indicating character which, when encountered by a decompression program, informs the program that run-length encoding occurred. The next character in the sequence, the asterisk, tells the program what character was compressed. The third character in the compressed sequence, 8, tells the program how many compressed characters were in the compressed run-length coding sequence so the program can decompress the sequence back into its original sequence. Because the special compression-indicating character can occur naturally in data, when this technique is used the compression program will add a second character to the sequence when the character appears by itself. Thus, this technique can result in data expansion and explains why the compression-indicating character has to be carefully selected. Another popular method of implementing run-length coding involves using the character to be compressed as the compression-indicating character whenever a sequence of three or more characters occurs. Here, the program converts every sequence of three or more characters to the three characters followed by the character count. Thus, the sequence ******** would be compressed as ***8. Although this method of run-length coding requires one additional character, it eliminates the necessity of inserting an additional compression-indicating character when that character occurs by itself in a data stream.

[0298] The average length {overscore (l)} of these strings or clusters is given by $\overset{\_}{l} = {\sum\limits_{i = 1}^{N}\quad {{P\left( l_{i} \right)}l_{i}}}$

[0299] where N is the number of clusters, l_(i) is the length of the ith cluster, and P(l_(i)) is the probability of the ith cluster. Their entropy is given by $E = {- {\sum\limits_{i = 1}^{N}\quad {{P\left( l_{i} \right)}\log_{2}{P\left( l_{i} \right)}}}}$

[0300] The maximum possible CR for a run length encoding scheme is ${CR}_{\max} = \frac{\overset{\_}{l}}{E}$

[0301] 4.3 Compression, Encoding and Encryption

[0302] Using a data compression algorithms together with an encryption algorithm makes sense for two reasons:

[0303] (i) Cryptanalysis relies on exploring redundancies in the plaintext; compressing a file before encryption reduces these redundancies.

[0304] (ii) Encryption is time-consuming; compressing a file before encryption speeds up the entire process.

[0305] In is important to remember, that if a file to be encrypted, it is very useful to apply data compression to the content of the file before this takes place. The data compression can be reversed after the file has been decrypted. This is advantageous for two distinct reasons. First, the file to be encrypted is reduced in size, thus reducing the overhead caused by encryption. Second, if the original data contained regular patterns, these are made much more random bad the compression process, thereby making it more difficult to “crack” the encryption algorithm. If a system is designed which adds any type of transmission encoding or error detection and recovery, then it should be added after encryption. If there is noise in the communications path, the decryptions error-extension properties will only make that noise worse. FIG. 4.1 summarises these steps.

[0306] 5 Random Number Generators

[0307] 5.1 Introduction to Random Number Generators

[0308] Random-number generators are not random because they do not have to be. Most simple applications, such as computer games for example, need very few random numbers. However, cryptography is extremely sensitive to the properties of random-number generators. Use of a poor random-number generator can lead to strange correlations and unpredictable results. If a security algorithm is designed around a random-number generator, spurious correlations must be avoided at all costs.

[0309] The problem is that a random-number generator does not produce a random sequence. In general, random number generators do not necessarily produce anything that looks even remotely like the random sequences produced in nature. However, with some careful tuning, they can be made to approximate such sequences. Of course, it is impossible to produce something truly random on a computer. As John von Neumann states, “Anyone who considers arithmetical methods of producing random digits is, of course, in a state of sin”. Computers are deterministic—stuff goes in at one end, completely predictable operations occur inside, and different stuff comes out the other end. Put the same data into two identical computers, and the same data comes out of both of them (most of the time!).

[0310] A computer can only be in a finite number of states (a large finite number, but a finite number nonetheless), and the data that comes out will always be a deterministic function of the data that went in and the computer's current state. This means that any random-number generator on a computer (at least, on a finite-state machine) is, by definition, periodic. Anything that is periodic is, by definition, predictable and can not therefore be random. A true random-number generator requires some random input; a computer can not provide this.

[0311] 5.1.1 Pseudo-Random Sequences

[0312] The best a computer can produce is a pseudo-random-sequence generator. Many authors have attempted to define a pseudo-random sequences formally. In this section an general overview is given.

[0313] A pseudo-random sequence is one that looks random. The sequence's period should be long enough so that a finite sequence of reasonable length—that is, one that is actually used—is not periodic. If for example, a billion random bits is required, then a random sequence generator should not be chosen that repeats after only sixteen thousand bits. These relatively short nonperiodic sequences should be as indistinguishable as possible from random sequences. For example, they should have about the same number of ones and zeros, about half the runs (sequences of the same bit) should be of length one, one quarter of length two, one eighth of length three, and so on. In addition, they should not be compressible. The distribution of run lengths for zeros and ones should be the same. These properties can be empirically measured and then compared with statistical expectations using a chi-square test.

[0314] For our purpose, a sequence generator is pseudo-random if it has the following property:

[0315] Property 1: It looks random, which means that it passes all the statistical tests of randomness that we can find.

[0316] Considerable effort has gone into producing good pseudo-random sequences on a computer. Discussions of generators abound in the academic literature, along with various tests of randomness. All of these generators are periodic (there is no exception); but with potential periods of 2²⁵⁶ bits and higher, they can be used for the largest applications.

[0317] The problem with all pseudo-random sequences is the correlations that result from their inevitable periodicity. Every pseudo-random sequence generator will produce them if they are use extensively; this fact is often used by a cryptanalyst to attack the system.

[0318] 5.1.2 Cryptographically Secure Pseudo-Random Sequences

[0319] Cryptographic applications demand much more of a pseudo-random-sequence generator than do most other applications. Cryptographic randomness does not mean just statistical randomness. For a sequence to be crypto-graphically pseudo-randomly secure, it must also have the following property:

[0320] Property 2: It is unpredictable. It must be computationally non-feasible to predict what the next random bit will be, given complete knowledge of the algorithm or hardware generating the sequence and all of the previous bits in the stream.

[0321] Cryptographically secure pseudo-random sequences should not be compressible, unless the key is known. The key is related to the seed used to set the initial state of the generator.

[0322] Like any cryptographic algorithm, cryptographically secure pseudo-random-sequence generators are subject to attack. Just as it is possible to break an encryption algorithm, it is possible to break a cryptographically secure pseudo-random-sequence generator. Making generators resistant to attack is what cryptography is all about.

[0323] 5.1.3 Real Random Sequences

[0324] Is there such a thing as randomness? What is a random sequence? How do you know if a sequence is random? Is for example “101110100” more random than “101010101”? Quantum mechanics tells us that there is honest-to-goodness randomness in the real world but can we preserve that randomness in the deterministic world of computer chips and finite-state machines?

[0325] Philosophy aside, from our point of view, a sequence generator is real random if it has this additional third property.

[0326] Property 3: It cannot be reliably reproduced. If the sequence generator is run twice with the exact same input (at least as exact as computationally possible), then the sequences are completely unrelated; their cross-correlation function is effectively zero.

[0327] The output of a generator satisfying the three properties given above is good enough for a one-time pad, key generation, and other cryptographic applications that require a truly random sequence generator. The difficulty is in determining whether a sequence is really random. If a string is repeatedly encrypted with DES and a given key, then a random-looking output will be obtained. It will not be possible to tell whether it is non-random unless time is rented on a DES cracker.

[0328] 5.2 Cryptography and Random Numbers

[0329] Many authors suggest the use of random number generator functions in the math libraries which come with many compilers (e.g. the rand( ) function which is part of most C/C++compilers). Such generator functions are insecure and to be avoided for cryptographic purposes.

[0330] For cryptography, what is required is values which can not be guessed by an adversary any more easily than by trying all possibilities (“brute force” or “exhaustive search” strategies). There are several ways to acquire or generate such values, but none of them is guaranteed. Therefore, the selection of a random number source is a matter of art and assumptions.

[0331] There are a few simple guidelines to follow when using random number generators:

[0332] (i) Make sure that the program calls the generator's initialisation routine before it calls the generator.

[0333] (ii) Use seeds that are “somewhat random”, i.e. have a good mixture of bits. For example 2731774 and 10293082 are “safer” than 1 or 4096 (or some other power of two).

[0334] (iii) Note that two similar seeds (e.g. 23612 and 23613) may produce sequences that are correlated. Thus, for example, avoid initialising generators on different processors or different runs by just using the processor number or the run numbers as the seed.

[0335] (iv) Never trust the random number generator provided on a computer, unless someone who has a lot of expertise in this area can personally guarantee that it is a good generator. (N.B. This does not include guarantees from the computer vendor).

[0336] 5.3 Linear Congruential Generators

[0337] The most popular method for creating random sequences is the linear congruential method, first introduced by D H Lehmer in 1949. The algorithm requires four parameters:

[0338] m, the modulus: m>0

[0339] a, the multiplier: 0<a<m

[0340] the increment: 0<c<m

[0341] x₀, the seed or starting value: 0<x₀<m

[0342] The sequence of random numbers is then generated from recursion relation,

x _(n+1)=(ax _(n) +c) mod(m), n>0

[0343] The essential point to understand when employing this method is that not all values of the four parameters produce sequences that pass all the tests for randomness. All such generators eventually repeat themselves cyclically, the length of this cycle (the period) being at most m. When c=0, the algorithm, is faster and referred to as the multiplicity congruential method and many authors refer to mixed congruential methods when c=0.

[0344] To discuss all the mathematical justifications for the choice of m, c, a and x₀ is beyond the scope of this work. We therefore give a brief summary of some of the principal considerations.

[0345] For long periods, m must be large. The other factor to be considered in choosing m is the speed of the algorithm. Computing the next number in the sequence requires division by m and hence a convenient choice is the word size of the computer.

[0346] Perhaps the most subtle reasoning involves the choice of the multiplier a such that a cycle of period of maximum length is obtained. However, a long period is not the sole criterion that must be satisfied. For example, a=c=1, gives a sequence which has a maximum period m but is anything but random. It is always possible to obtain the maximum period but a satisfactory sequence is not always attained. When m is the product of distinct primes only a=1 will produce a full period, but when m is divisible by a high power of some prime, there is considerable latitude in the choice of a.

[0347] The following theorem dictates the choices that give a maximum period.

[0348] Theorem 5.1

[0349] The linear congruential sequence defined by a, m, c and x₀ has period of length m if and only if,

[0350] (i) c is relatively prime to m;

[0351] (ii) b=a−1 is a multiple of p for every prime p dividing m;

[0352] (iii) b is a multiple of 4, if m is a multiple of 4.

[0353] Traditionally uniform random number generators produce floating point numbers between 0 and 1, with other ranges obtainable by translation and scaling.

[0354] 5.4 Data Sizing

[0355] In general, ciphering techniques based on random number sequences cause an enlargement of data size as a result of the ciphering process. In this section a rough estimation is made of dependence between the length of the input buffer and the output buffer.

[0356] Suppose the length of the input buffer is equal to n. If we combine all the input data to obtain a binary sequence, how many regions does it consist of? At most, it can include N=8n regions (if this sequence, from start to finish is something like “010101 . . . ” or “101010 . . . ”). At least, it can include 1 region (if the sequence, from start to finish is of the type “1111 . . . ” or “0000 . . . ”). If we restrict the maximum number of bits in a region to be equal to P, then the minimum number of regions will be least integer greater or equal to 8n/P. The average number of regions N_(av) in a bit sequence will roughly be given by $N_{a\quad v} = {\frac{{8n} + {8{n/P}}}{2} = {4n\frac{1 + P}{P}}}$

[0357] We require log₂(Q+P)+1 bits to store the number of bits in any region where Q is the maximum value of the random number sequence. Hence, a bit sequence of length 8n, after ciphering will have an average length of $N_{a\quad v} = {4n{\frac{1 + P}{P}\left\lbrack {{\log_{2}\left( {Q + P} \right)} + 1} \right\rbrack}}$

[0358] Dividing the final number of bits by the original, we obtain ${4n{\frac{1 + P}{P}\left\lbrack {{\log_{2}\left( {Q + P} \right)} + 1} \right\rbrack}\frac{1}{{\delta \quad n}\quad}} = {\frac{1 + P}{2P}\left\lbrack {{\log_{2}\left( {Q + P} \right)} + 1} \right\rbrack}$

[0359] Consider the case when P=8 and Q=8. Then after ciphering, the length of data will increase 3.375 times. Thus, an input of 512 bytes produces an output of approximately 1.8 Kb.

[0360] Cyphering techniques based on random number sequences depend on the following critical values:

[0361] (i) P—maximum number of bits in the bit segment;

[0362] (ii) Q—maximum value of random number sequence;

[0363] (iii) Order of bits (from left to right or reverse) in the bit field;

[0364] (iv) Placement of bit type in the bit field (leftmost or rightmost);

[0365] (v) The seed value and other parameters associated with the random number iterator.

[0366] From the list above, only the last point should be used as key values in a communications process. A ciphering technique should allow control over the rest of the parameters, although they could be derived once the seed is selected (which is the essential key parameter).

[0367] How can we increase the general security of the whole algorithm? One way is to make several iterations of the algorithm one after another, changing the initial conditions every time. The following data is then required to perform decoding: the number of iterations; the seed for each iteration sequence. This however, leads to a significant enlargement of the resulting data.

[0368] 6 Chaos

[0369] 6.1 Introduction

[0370] Chaos is derived from a Greek verb that means “to gage open”, but in our society, chaos evokes visions of disorder. In a sense, chaotic systems are in unstable equilibrium; even the slightest change to the initial conditions of the system at time t leads the system to a very different outcome at some arbitrary later time. Such systems are said to have a sensitive dependence on initial conditions.

[0371] Some system models such as that for the motion of planets within our solar system contain many variables, and yet are still relatively accurate. With chaotic systems, however, even when there are hundreds of thousands of variables involved, no accurate prediction of their behaviour can be made. For example, the weather is known to be a chaotic system. Despite the best efforts of beleaguered meteorologists to forecast the weather, they very frequently fail, especially at local levels. There is a famous anecdote about the movement of a butterfly's wings in Tokyo affecting the weather in New York. This is typical of a chaotic system and illustrates its sensitive dependence on initial conditions. Chaotic systems appear in virtually every aspect of life. Traffic patterns tend to be chaotic, the errant manoeuvre of even one car can create an accident or traffic jam that can affect thousands of others. The stock market is a chaotic system because the behaviour of one investor, depending on the political situation or corporation, can alter prices and supply. Politics, particularly the politics of non-democratic societies, is also chaotic in the sense that a slight change in the behaviour of a dominant individual can effect the behaviour of millions. In this sense, democracy can be defined as a “chaos limiting”. In general, chaos is the study of situations in which the slightest actions can have far-reaching repercussions.

[0372] 6.2 The Feigenbaum Diagram

[0373] By way of a short introduction to chaotic systems, we consider the properties of the Feigenbaum diagram which has become an important icon of chaos theory. The diagram is a computer generated image and is necessarily so. That is to say that the details of it can not be obtained without the aid of a computer. Consequently, the mathematical properties associated with its structure would have remained elusive without the computer. This applies to the investigation of most chaotic systems whose properties are determined as much numerical experimentation as they are through the rigours and functional and stability analysis.

[0374] One essential structure seen in the Feigenbaum diagram (an example of which is given in FIG. 5) is the branching which portrays the dynamical behaviour of the iterator x→ax(1−x). Out of the major stem, we see two branches bifurcation, and out of these branches we see two more and so on. This is the period-doubling regime of the iterator.

[0375] For a=4, we have chaos and the points of the final state densely fill the complete interval, i.e. at a=4, chaos governs the whole interval from 0 to 1 (of the dependent axis). This image is called the Feigenbaum diagram because it is intimately connected with the ground breaking work of the physicist Mitchell Feigenbaum. Another point to note is that the chaotic region for 0.9<r<1 bifurcating structures are found at smaller scales (not visible in FIG. 7) which resemble the structures shown for 0.3<r<0.9. In other words the Feigenbaum diagram (like many other “phase space” diagrams) exhibits self-similar. This diagram is therefore an example of a fractal. In general, chaotic systems, if analysed in the appropriate phase space, are characterised by self-similar structures. Chaotic systems therefore produce fractal objects and can be analysed in terms of the fractal geometry that characterises them.

[0376] 6.3 Example of a Chaos Generator: The Verhulst Process

[0377] The encryption techniques reported in this work depend on using a chaos generator instead of, or in addition to a pseudo-random number generator. Although many chaos generators exist and can in principle be used for this purpose, here, we consider one particular chaotic system—the “Verhulst Model”. This model describes the development of some population, influenced by some external environment. It assumes that the population growth rate depends on the current size of population.

[0378] We first normalise the population count by introducing x=P/N where P denotes the current population count and N is the maximum population count in a given environment. The range of x is then from 0 to 1. Let us index x by n, i.e. write x_(n) to refer to the size of the population at time steps n=0,1,2, . . . The growth rate is then measured by x_(n+1)/x_(n). Verhulst postulated that the growth rate at time n should be proportional to 1−x_(n) (the fraction of the environment that is not yet used up by the population at time n). Thus, we can consider a population growth model based on $\frac{x_{n + 1}}{x_{n}} \propto {1 - x_{n}}$

[0379] or after introducing a constant a and rearranging the result, x_(n+1)=ax_(n)(1−x_(n)) which yields the logistic model. Note, this is model used to generate the Feigenbaum diagram discussed in Section 5.1.2, i.e. the iterator x→ax_(n)(1−x).

[0380] Clearly, this process depends on two parameters: x₀ which defines the initial population size (seed value) and a which is a parameter of the process. One can expect that this process (as with any conventional process that can be described by a set of algebraical or differential equations), is of three kinds: (i) It can converge to some value x. (ii) It can be periodic. (iii) It can diverge and tend to infinity. However, this is not the case. The Verhulst generator, for certain initial values, is completely chaotic, i.e. it continues to be indefinitely irregular. This behaviour is compounded in the Feigenbaum diagram (FIG. 5) and is due to the nonlinearity of the iterator. In general, we can define four classes of behaviour depending on value of parameter r.

[0381] (i) 0<r<R₁: the process converges to some value p.

[0382] (ii) R₁<r<R₂: the process is period.

[0383] (iii) R₂<r<R₃: the process is chaotic.

[0384] (iv) R₃<r<0: the process tends to infinity.

[0385] The specific values of R₁, R₂ and R₃ depend on the seed value, but the general pattern remains the same. The region R₂<r<R₃ can be used for random number generation.

[0386] Another feature of this process is its sensitivity to the initial conditions. This effect is one of the central ingredients of what is called deterministic chaos. The main idea here is that any (however small) change in the initial conditions leads, after many iterations to a completely different resulting processes. In this sense, we cannot predict the development of this process at all due the impossibility of infinitely exact computations. However, we need to strictly determine the rounding rules which are used in generating a random sequence in order to receive the same results on different systems.

[0387] Many other chaos generators exist. In most cases they are compounded by iterative processes which are inherently nonlinear. This is not to say that all nonlinear processes produce chaos, but that chaotic processes are usually a result of nonlinear systems. A further discussion of this important issue is beyond the scope of this report.

[0388] 7 Fractals

[0389] 7.1 Fractal Geometry

[0390] Geometry, with its roots in ancient Greece, first dealt with the mathematically simplistic forms of spheres, cones, cubes etc. These exact forms, however, rarely occur naturally. A geometry suitable for describing natural objects—Fractal Geometry—was constructed this century and has only relatively recently (over the past twenty years)been research properly. This revolutionary field deals with shapes of infinite detail, such as coastlines, the branching of a river delta or nebulous forms of clouds for example and allows us to define and measure the properties of such objects. This measure is compounded in a metric called the Fractal Dimension.

[0391] Fracals arise in many diverse areas, from the complexity of natural phenomenon to the dynamic behaviour of nonlinear systems. Their striking wealth of detail has given them an immediate presence in our collective consciousness. Fractals are the subject of research by artists and scientists alike, making their study one of the truly renaissance activities of the late 20th century.

[0392] Definition

[0393] Unfortunately, a good definition of a fractal is elusive. Any particular definition either exclude sets that are thought of as fractals or to include sets that are not thought of as fractals. The definition of a ‘fractal’ should be regarded in the same way as the biologist regards the definition of ‘life’. There is no hard and fast definition, but just a list of properties and characteristic of a living thing. In the same way, it seems best to regard a fractal as a set that has properties such as those listed below, rather than to look for a precise definition which will almost certainly exclude some interesting cases.

[0394] If we consider a set F to be a fractal, then it should possess (some) of the following properties:

[0395] (i) F has detail at every scale.

[0396] (ii) F is (exactly, approximately, or statistically) self-similar.

[0397] (iii) The ‘Fractal Dimension’ of F is greater than its topological dimension.

[0398] (iv) There is a simple algorithmic description of F.

[0399] 6.2 The Similarity (Fractal) Dimension

[0400] Central to fractal geometry is the concept of self-similarity, which means that some types of mainly naturally occurring objects look similar at different scales. Self-similar objects are compounded by a parameter called the ‘Similarity Dimension’ or the ‘Fractal Dimension’, D. This is defined as $\begin{matrix} {{N\quad \tau^{D}} = {{1\quad {or}\quad D} = {- \frac{\ln \quad N}{\ln \quad \tau}}}} & (6.1) \end{matrix}$

[0401] where N is the number of distinct copies of an object which has been scaled down by a ratio r in all co-ordinates. There are two distinct types of fractals which exhibit this property:

[0402] (i) Deterministic Fractals;

[0403] (ii) Random Fractals.

[0404] Deterministic fractals are objects which look identical at all scales. Each magnification reveals an ever finer structure which is an exact replication of the whole, i.e. they are exactly self-similar. Random fractals do not, in general, possess such deterministic self-similarity; such fractal sets are composed of N distinct subsets, each of which is scaled down by a ratio r from the original and is identical in all statistical respects to the scaled original—they are statistically self-similar. The scaling ratios need not be the same for all scaled down copies. Certain fractals sets are composed of the union of N distinct subsets, each of which is scaled down by a ratio r_(i)<1, 1≦i≦N from the original in all co-ordinates. The similarity dimension is given by the generalisation of Eq. (6.1), namely ${\sum\limits_{i = 1}^{N}\quad \tau_{i}^{D}} = 1$

[0405] A further generalisation leads to self-affine fractals sets which are scaled by different ratios in different co-ordinates. The equation

ƒ(λx)=λ^(H)ƒ(x) ∀λ>0   (6.2)

[0406] where λ is a scaling factor and H is a scaling exponent implies that a scaling in the x co-ordinate by λ gives a scaling of the ƒ coordinate by a factor λ^(H). A special case of Eq. (6.2) occurs when H=1; in this case, we have a scaling of x by λ producing a scaling of ƒ by λ, i.e. ƒ(x) is self-similar.

[0407] Naturally occurring fractals differ from strictly mathematically defined fractals in that they do not display statistical or exact self-similarity over all scales but exhibit fractal properties over a limited range of scales.

[0408] 6.3 Random Fractals

[0409] 6.3.1 Classical Brownian Motion

[0410] There are many examples in the field of physics, chemistry and biology of random processes. Brownian motion is a relevant mathematical model for many such physical processes. These processes display properties which have now been shown to be best described as fractal processes.

[0411] In Brownian motion, the position of a particle at one time is not independent of the particles motion at a previous time. It is the increments of the position that are independent. Brownian motion in 1D is seen as a particle moving backwards and forwards on the x-axis for example. If we record the particles position on the x-axis at equally spaced time intervals, then we end up with a set of points on a line. Such a point-set is self-similar.

[0412] On the other hand, if we include time as an extra co-ordinate and plot the particles position against time—called the record of the motion—we obtain a point-set that is self-affine. In Section 6.3.2, we give an example of a physical process that has been modelled by Brownian motion.

[0413] 6.3.2 Diffusion as an Example of Brownian Motion

[0414] For a particle moving in 1D (along the x-axis), consider the following model for its motion. At time interval τ a displacement (or increment) ξ is chosen at random from a Gaussian probability distribution given by ${P\left( {\xi {,\quad}\tau} \right)} = {\frac{1}{\sqrt{4{\pi\rho\tau}}}{\exp \left( {- \frac{\xi^{2}}{4{\rho\tau}}} \right)}}$

[0415] where ρ is the diffusion coefficient. The probability of finding ξ in the range ξ to ξ+dξ is P(ξ, τ)dξ and the sequence of the increments {ξ_(i)} is a set of independent Gaussian random variables. The variance of the process is ⟨ξ²⟩ = ∫_(−∞)^(∞)ξ²P(ξ,  τ)  ξ = 2ρτ

[0416] where

•

denotes the expectation. The position of the particle at time t is then ${x(t)} = {\sum\limits_{i = 1}^{n}\quad {\xi \quad i}}$

[0417] Normally, for convenience, the extra condition x(0)=0 is imposed.

[0418] 6.2.3 Scaling Properties

[0419] Suppose that we observe the motion not at intervals τ, but at intervals λτ where λ is some arbitrary number. For example, if λ=2, the increment ξ during time interval t to t+2τ will be given by ξ=ξ₁+ξ₂ where ξ₁ is the increment in time interval t to t+τ and ξ₂ is the increment in time interval t+τ to t+2τ. ξ₁ and ξ₂ are independent increments and hence the joint probability P(ξ₁:ξ₂, τ), that the first increment is the range ξ₁ to ξ₁+dξ₁ and the second increment is in the range ξ₂ to ξ₂+dξ₂ is given by

P(ξ₁:ξ₂, τ)=P(ξ₁, τ)P(ξ₂, τ)

[0420] Hence the probability density for ξ is given by integrating over all possible combinations of increments ξ₁ and ξ₂ such that ξ=ξ₁+ξ₂, i.e. ${P\left( {\xi,\quad 2\tau} \right)} = {{\int_{- \infty}^{\infty}{{P\left( {\xi - {\xi_{1},\quad \tau}} \right)}{P\left( {\xi_{1},\quad \tau} \right)}\quad {\xi_{1}}}} = {\frac{1}{\sqrt{4\pi \quad {\rho 2\tau}}}{\exp \left( {- \frac{\xi^{2}}{4{\rho 2\tau}}} \right)}}}$

[0421] Therefore, if the particle is viewed with half the time resolution, the increments are still a random Gaussian process with

ξ

=0, but with variance now given by

ξ²

=2×2ρτ, i.e. twice the value obtained when the process is viewed at intervals τ. In general, for observations at time interval λ, we obtain ${P\left( {\xi,\quad {\lambda\tau}} \right)} = {\frac{1}{\sqrt{4{\pi\rho\lambda\tau}}}{\exp \left( {- \frac{\xi^{2}}{4{\rho\lambda\tau}}} \right)}}$

[0422] where

ξ

=0 and

ξ²

=λ×2ρτ. Note, that with τ=λτ and ξ=λ^(1/2)ξ, we have

P(ξ=λ^(1/2)ξ, τ=λτ)=λ^(−1/2) P(ξ, τ)

[0423] which is the scaling relation for the probability density. The above equation shows that the Brownian process is invariant in its statistical distribution under a transformation that changes the time scaled by a factor λ and the length scale by a factor λ^(1/2). The name given to such transformations is affine and the curves or records that reproduce themselves in some sense under transformations of this type are called self-affine.

[0424] We may also find the probability distribution for the particle position x(t) by noting that

P[x(t)−x(t ₀)]=P[x(t)−x(t ₀), t−t ₀]

[0425] which gives ${P\left\lbrack {{x(t)} - {x\left( t_{0} \right)}} \right\rbrack} = {\frac{1}{\sqrt{\left. {4{\pi\rho}} \middle| {t - t_{0}} \right|}}{\exp \left( {- \frac{\left\lbrack {{x(t)} - {x\left( t_{0} \right)}} \right\rbrack^{2}}{\left. {4\rho} \middle| {t - t_{0}} \right|}} \right)}}$

[0426] and satisfies the scaling relation

P[λ ^(−1/2) , x(λt)−x(λt ₀)]=λ^(−1/2) P[x(t)−x(t ₀)]

[0427] In the above equation x(t₀) is the particles position at some arbitrary reference time.

[0428] Finally expressions for the mean, mean absolute and the variance of the particles position can be derived and are given respectively by

x(t)−x(t₀)

=0 ${\langle\left| {{x(t)} - {x\left( t_{0} \right)}} \right|\rangle} = \left. \sqrt{\frac{4\rho}{\pi}} \middle| {t - t_{0}} \right|^{\frac{1}{2}}$

 

[x(t)−x(t ₀)]²

=2 ρ|t−t ₀|

[0429] For ξ a normalised independent Gaussian random process, we then have $\left. {{{x(t)} - {x\left( t_{0} \right)}} \propto \xi} \middle| {t - t_{0}} \right|^{\frac{1}{2}}$

[0430] This result can be generalised to the form

x(t)−x(t ₀)∝ξ|t−t ₀|^(H), 0<H<1

[0431] which provides the basis for Fractional Brownian Motion.

[0432] Fractional Brownian Motion is an example of statistical fractal geometry and is the basis for the coding technique discussed in the following chapter (albeit via a different approach which introduces fractional differentiation).

[0433] 7 Random Fractal Coding

[0434] In this chapter, the theoretical basis is provided of Random Fractal Coding in which random fractals are used to code binary data in terms of variations in the fractal dimension such that the resulting fractal signals are characteristic of the background noise associated with the medium (HF radio, microwave, optical fibre etc.) through which information is to be transmitted. This form of ‘data camouflaging’ is of value in the transmission of sensitive information particularly for military communications networks and represents an alternative and potentially more versatile approach to the spectral broadening techniques commonly used to scramble signals.

[0435] The basic idea is to disguise the transmission of a bit stream by making it ‘look like’ background noise which spectral broadening does not attempt to do. Thus instead of transmitting a frequency modulated signal (in which 0 and 1 are allocated different frequencies), a fractal signal is transmitted in which 0 and 1 are allocated different fractal dimensions.

[0436] 7.1 Introduction

[0437] The application of random fractal geometry for modelling naturally occurring signals (noise) and visual camouflage is well known. This is due to the fact that the statistical and spectral characteristics of random fractals are consistent with many objects found in nature; a characteristic which is compounded in the term ‘statistical self-affinity’. This term refers to random processes which have similar distributions at different scales. For example, a random fractal signal is one whose distribution of amplitudes remains the same whatever the scale over which the signal is sampled. Thus, as we zoom into a random fractal signal, although the pattern of amplitude fluctuations change, the probability density distribution of these amplitudes remains the same. Many noises found in nature are statistically self-affine including transmission noise.

[0438] The technique discussed in this section is based on converting bit streams into sequences of random fractal signals with the aim of making these signal indistinguishable from the background noise of the system through which information is transmitted. This method of data camouflage has applications in military communications systems in which binary data is scrambled and transmitted in a form that appears to be “like” the background “static” of the system. This relies significantly on the type and accuracy of the model that is chosen to simulate transmission noise.

[0439] 8.2 Digital Communications Systems and Data Camouflaging

[0440] A Digital Communications Systems is a system that is based on transmitting and receiving bit streams (binary sequences). The basic processes involved are given below.

[0441] (i) Digital signal (speech, video etc.)

[0442] (ii) Conversion from floating point to binary form.

[0443] (iii) Modulation and transmission.

[0444] (iv) Demodulation and reception of binary sequence +transmission noise.

[0445] (v) Reconstruction of digital signal.

[0446] In the case of sensitive information, an additional step is required between stages (ii) and (iii) above where the binary form is coded according to a classified algorithm. Appropriate decoding is then introduced between stages (iv) and (v) with suitable pre-processing to reduce the effects of transmission noise for example. In addition, scrambling methods can be introduced during the transmission phase. The conventional approach to this is to apply “Spectral Broadening”. This is where the spectrum of the signal is distorted by adding random numbers to the out-of-band component of the spectrum. The original signal is then recovered by lowpass filtering. This approach requires an enhanced bandwidth but is effective in the sense that the signal can be recovered from data with a very low signal-to-noise ratio. From the view of transmitting sensitive information, the approach discussed above is ideal in that recovery of the information being transmitted is very difficult for any unauthorised reception. However, in this approach to data scrambling it is clear that information is being transmitted of a sensitive nature to any unauthorised reception. In this sense, the information is not camouflaged. The purpose of fractal coding is to try and make the information content of the transmission phase “look like” transmission noise so that any unauthorised receipt is incapable of distinguishing between the transmission of sensitive information and background “static”. For this purpose, the research reported here, has focused on the design of algorithms which encode binary sequences in terms of a unique set of fractal parameters which can then be used to produce a new digital (random fractal) signal which is characteristic of transmission noise. These fractal parameters represent main key(s) to this type of encryption. The principal criteria that have been adopted are as follows:

[0447] (i) The algorithm must produce a signal whose characteristics are compatible with a wide range of transmission noise.

[0448] (ii) The algorithm must be invertable and robust in the presence of genuine transmission noise (with low Signal-to-Noise Ratios).

[0449] (iii) The data produced by the algorithm should not require greater bandwidth than that of a conventional system.

[0450] (iv) The algorithm should ideally make use of conventional technology, i.e. digital spectrum generation (FFT), real-time correlators etc.

[0451] 8.3 Models for Transmission Noise

[0452] The ideal approach for developing a model for transmission noise is to analyse the physics of a transmission system. There are a number of problems with his approach. First, the physical origins of many noise types are not well understood. Secondly, conventional approaches for modelling noise fields usually fail to accurately predict their characteristics. There are two principal approaches to defining the characteristics of a noise field:

[0453] (i) The Probability Distribution Function (PDF)—the shape or envelope of the distribution of amplitudes of the field.

[0454] (ii) The Power Spectral Density Function (PSDF) of the noise the shape or envelope of the power spectrum.

[0455] On the basis of these characteristics, nearly all noise field have two fundamental characteristics:

[0456] (i) The PSDF is characterized by irrational power laws.

[0457] (ii) The field is self-affine.

[0458] Here, we consider a phenomenological approach which is based on a power law that can be used to describe a range of PSDFs and is consistent with the signal being statistically self-affine.

[0459] We consider a PSDF of the form ${P(\omega)} = \frac{A\quad \omega^{2_{g}}}{\left( {\omega_{0}^{2} + \omega^{2}} \right)^{q}}$

[0460] where g and q are positive (floating point) numbers, A is a scaling factor and ω₀ is the characteristic frequency of the spectrum. This model is a generalisation of three distinct PSDFs used for stochastic modelling:

[0461] (i) Fractional Brownian Motion (g=0, ω₀=0)

[0462] (ii) Ornstein-Uhlenbeck model $\left( {g = {{\frac{1}{2},\quad q} = 1}} \right)$

[0463] (iii) Bermann process (q=1)

[0464] For ω>0 and q>g, the PSDF P(ω) is has as maximum when when ω=ω₀{square root}{square root over (g/(q−g))}. The value of P(ω) at this point is ${P(\omega)} = {A\quad \omega_{0}^{2{({g - g})}}\frac{g^{g}}{q^{q}}\left( {q - g} \right)^{q - g}}$

[0465] Beyond this point, the PSDF decays and its asymptotic form is dominated by a ω^(−2q) power law which is consistent with random fractal signals. At low frequencies, the PSDF is characterised by the term ω^(2g)

[0466] The complex spectrum of the noise can then be written as

N(ω)=H _(gq)(ω)W(ω)

[0467] where H_(g,q) is the transfer function given by (B={square root}{square root over (A)}) $H_{gq} = \frac{{B\left( {\quad \omega} \right)}^{g}}{\left( {\omega_{0} + \quad {\quad \omega}} \right)^{q}}$

[0468] and W(ω) is the complex spectrum of ‘Gaussian white noise’ (δ—uncorrelated noise). Here, the term ‘Gaussian white noise’ is defined conventionally as Gaussian noise (i.e. noise with a zero mean Gaussian distribution of amplitudes) whose PSDF is a constant. The noise field n(t) as a function of time t is then given by the inverse Fourier transform of N(ω), i.e. ${n(t)} = {{\frac{1}{2\pi}{\int_{- \infty}^{\infty}{{N(\omega)}{\exp \left( {\quad \omega \quad t} \right)}\quad {\omega}}}} = {\frac{B}{\Gamma (q)}{\int_{- \infty}^{t}{\frac{^{- {\omega_{0}{({t - \tau})}}}}{\left( {t - \tau} \right)^{1 - q}}\frac{^{g}}{\tau^{g}}{\omega (\tau)}\quad {\tau}}}}}$

[0469] where ${w(t)} = {\frac{1}{2\pi}{\int_{- \infty}^{\infty}{{W(\omega)}{\exp \left( {{\omega}\quad t} \right)}\quad {\omega}}}}$

[0470] This new integral transform is an example of a fractional integral transform and contains a fractional derivative as part of its integrand.

[0471] Scaling Characteristics

[0472] The scaling characteristics of this transform can be investigated by considering the function $\begin{matrix} {{n^{\prime}\left( {t,\quad \omega_{0}} \right)} = {\frac{1}{\Gamma (q)}{\int_{- \infty}^{t}{\frac{\exp \left\lbrack {- {\omega_{0}\left( {t - \tau} \right)}} \right\rbrack}{\left( {t - \tau} \right)^{1 - q}}\frac{^{g}}{\tau^{g}}{n({\lambda\tau})}\quad {\tau}}}}} \\ {= {{\frac{\lambda^{g}}{\lambda^{q}}\frac{1}{\Gamma (q)}{\int_{- \infty}^{\lambda \quad t}{\frac{\exp \left\lbrack {{- \frac{\omega_{0}}{\lambda}}\left( {{\lambda \quad t} - \tau} \right)} \right\rbrack}{\left( {{\lambda \quad t} - \tau} \right)^{1 - q}}\frac{^{g}}{\tau^{q}}{n(\tau)}\quad {\tau}}}} = {\frac{\lambda^{g}}{\lambda^{q}}{n\left( {\lambda \quad t,\quad {\omega_{0}/\lambda}} \right)}}}} \end{matrix}$

[0473] Hence, the scaling relationship for this model is ${\Pr \left\lbrack {n^{\prime}\left( {t,\quad \omega_{0}} \right)} \right\rbrack} = {\frac{\lambda^{g}}{\lambda^{q}}{\Pr \left\lbrack {n\left( {\lambda \quad t,\quad {\omega_{0}/\lambda}} \right)} \right\rbrack}}$

[0474] where Pr[ ] denotes the probability density function. Here, as we scale t by λ, the characteristic frequency ω₀ is scaled by 1/λ. The interpretation of this result, is that as we zoom into the signal ƒ(t), the distribution of amplitudes (i.e. the probability density function) remains the same (subject to a scaling factor of λ^((g−q))) and the characteristic frequency of the signal increases by a factor of 1/λ.

[0475] 7.4 Random Scaling Fractal Signals

[0476] Given the PSDF ${P(\omega)} = \frac{A\quad \omega^{2g}}{\left( {\omega_{0}^{2} + \omega^{2}} \right)^{q}}$

[0477] a random scaling fractal signal is obtained by setting g=0 and ω₀=0 We can then write ${P(\omega)} = \frac{A}{\quad \omega^{2q}}$

[0478] where q is defined in terms of the fractal dimension D (1<D<2) via the formula $q = \frac{5 - {2D}}{2}$

[0479] This result is consistent with the spectral noise model (ignoring scaling constant A) $\begin{matrix} {{N(\omega)} = \frac{W(\omega)}{\left( {\quad \omega} \right)^{q}}} \\ {{n(t)} = {{\hat{R}\left\lbrack {n(t)} \right\rbrack} = {\frac{1}{\Gamma (q)}{\int_{- \infty}^{t}{\frac{\omega \quad (\tau)}{\left( {t - \tau} \right)^{1 - q}}{\tau}}}}}} \end{matrix}$

[0480] which is a fractional integral transform known as the Riemann-Liouville transform. Note, that n can be considered to be a solution to the fractional stochastic differential equation ${\frac{^{q}}{t^{q}}{n(t)}} = {\omega \quad (t)}$

[0481] Also, the Riemann-Liouville integral has the following fundamental property $\begin{matrix} {{n^{\prime}(t)} = {{\hat{R}\left\lbrack {\omega \left( {\lambda \quad t} \right)} \right\rbrack} = {\frac{1}{\lambda^{q}}{n\left( {\lambda \quad t} \right)}}}} \\ {{\Pr \left\lbrack {n^{\prime}(t)} \right\rbrack} = {\frac{1}{\lambda^{q}}{\Pr \left\lbrack {n\left( {\lambda \quad t} \right)} \right\rbrack}}} \end{matrix}$

[0482] which describes statistical self-affinity.

[0483] 7.5 Algorithm for Computing Fractal Noise and the Fractal Dimension

[0484] The theoretical details discussed in the last section allow the following algorithm to be developed to generate fractal noise using a Fast Fourier Transform (FFT).

[0485] Step 1. Compute a pseudo-random (floating point) number sequence ω_(i); i=0, 1, . . . , N−1 using the Linear Congruential Method discussed in Chapter 4.

[0486] Step 2. Compute the Discrete Fourier Transform (DFT) of ω_(i) giving W_(i) (complex vector) using a standard FFT algorithm.

[0487] Step 3. Filter W_(i) with 1/ω_(i) ^(q) where q=(5−2D)/2, 1<D<2 and D—the Fractal Dimension of the signal—is defined by the user.

[0488] Step 4. Inverse DFT the result using a FFT to obtain n_(i) (real part of complex vector).

[0489] Inverse Solution

[0490] The inverse problem is then defined thus: Given n_(i) compute D. One obvious approach to this problem (one which is consistent with the theory given in Section 7.4) is to estimate D from the power spectrum of n_(i) whose expected form (for the positive half space) is ${{{\hat{P}}_{i} = \frac{A}{\omega_{i}^{\beta}}};{\beta = {2q}}},{\omega_{i} > {0{\forall i}}}$

[0491] Consider

e(A, β)=∥ ln P _(i)−ln {circumflex over (P)}_(i)∥₂ ²

[0492] where P_(i) is the power spectrum of n_(i).

[0493] Solving the equations (least squares method) ${\frac{\partial e}{\partial\beta} = 0};{\frac{\partial e}{\partial A} = 0}$

[0494] gives $\begin{matrix} {\beta = {\frac{{N{\sum\limits_{i}{\left( {\ln \quad P_{i}} \right)\left( {\ln \quad \omega_{i}} \right)}}} - {\left( {\sum\limits_{i}{\ln \quad \omega_{i}}} \right)\left( {\sum\limits_{i}{\ln \quad P_{i}}} \right)}}{{N{\sum\limits_{i}\left( {\ln \quad \omega_{i}} \right)^{2}}} - \left( {\sum\limits_{i}{\ln \quad \omega_{i}}} \right)^{2}}\quad {and}}} \\ {A = {\exp \left( \frac{{\sum\limits_{i}{\ln \quad P_{i}}} + {\beta {\sum\limits_{i}{\ln \quad \omega_{i}}}}}{N} \right)}} \end{matrix}$

[0495] The algorithm required to implement this inverse solution can therefore be summarised as follows:

[0496] Step 1. Compute the power spectrum P_(i) of fractal noise n_(i) using a FFT.

[0497] Step 2. Extract the positive half space data.

[0498] Step 3. Compute β using the formula above.

[0499] Step 4. Compute the Fractal Dimension D=(5−β)/2.

[0500] This algorithm provides a reconstruction of D that is on average accurate to 2 decimal places for N>64.

[0501] 7.6 Fractal Coding of Binary Sequences

[0502] The method of coding involve generating fractal signals in which two fractal dimensions are used to differentiate between a zero bit and a non-zero bit. The technique is outlined below.

[0503] (i) Given a binary sequence, allocate D_(min) to bit=0 and D_(max) to bit=1.

[0504] (ii) Compute a fractal signal of length N for each bit in the sequence.

[0505] (iii) Combine the results to produce a continuous stream of fractal noise.

[0506] In each case, the number of fractals per bit can be increased. This has the result of averaging out the variation in the estimates of the fractal dimensions. The information retrieval problem is then solved by computing the fractal dimensions using the Power Spectrum Method discussed in Section 7.5 using a conventional moving window principle to given the fractal dimension signature D_(i). The binary sequence is then obtained from the following algorithm: Given that $\Delta = {D_{\min} + \frac{D_{\max} - D_{\min}}{2}}$

[0507] if

[0508] D_(i)≦Δ

[0509] then bit=0

[0510] else

[0511] if D_(i)>Δ

[0512] then bit=1

[0513] The principal criteria for the optimization of this coding technique (the basis for numerical experiments) is to minimize $D_\rm max-D_\rm min$ subject to accurate reconstruction in the presence of real transmission noise.

[0514] 9. Overview of the Algorithm

[0515] 9.1 Encryption

[0516] The data enciphering algorithm reported in this work uses the Random or Chaotic number generator discussed in Chapters 4 and 5 respectively and the Fractal Coding method discussed in Chapter 7. The algorithm consists of the following steps which provide a general description of each stage of the encryption and decryption process.

[0517] 9.1.1 Encryption Using Runlength Coding

[0518] (i) Assuming the data has been transformed into a bit pattern, segments are extracted where values of the bits are the same, e.g. the bit sequence “011110000” is segmented into three regions as “0”, “1111”, “0000”. At this stage, the maximum number of bits in any region P is determined. In order to efficiently store the resulting data, this number must be power of 2. The type of each segment (i.e. whether it consists of 0's or 1's) is also stord for future use.

[0519] (ii) The total number of segments N is calculated.

[0520] (iii) The number of bits in each region is calculated. Using the example above, we get “1’, “4’, “4’. Note, the size of all segments are in the range [1,P].

[0521] 9.1.2 Encryption by Using Chaotic and Psuedo Random Numbers

[0522] (iv) A sequence of random numbers of length N is generated using a psuedo random number generator or a chaos generator and normalised so that all floating point numbers are in the range [0, 1]. (Negative numbers are not considered because it is not strictly necessery to use them and they require one more bit to store and sign.) These numbers are then scaled and converted into (nearest) integers. The scale can be arbitrary. However, if the maximum value of the sequence is Q−1, then log₂Q log_(2iii)Q bits are required to store any number from the sequence. Thus in order to efficiently use these bits, Q should be a power of 2.

[0523] (v) The numbers from both sequences (i.e. those obtained from run length coding K_(i) and the random integer sequence R_(i) are added together to give a third sequence D_(i)=K_(i)+R_(i). The numbers associated with these new sequence fall into the range[0,P+Q]

[0524] (vi) Each integer in the sequence D_(i) is transformed into its corresponding binary form i.e. to fill some binary field with corresponding data. To store any number, the bit field is required to be of length log₂(Q+P). A further bit is required to store the type (0 or 1). For this purpose the leftmost or rightmost bit of field can be used. It is necessery to use fields of the same size even if some numbers do not fill it completely, otherwise it is not possible to distinguish these combined bit fields during deciphering. The unnecessary bits are filled with 0's.

[0525] (vii) The binary fields are concatenated to give a continuous bit stream.

[0526] 9.1.3 Camouflaging Bit Streams Using Fractal Coding.

[0527] Once the bit stream has been coded [steps (i)-(vii)]it can be camouflaged using the fractal coding scheme discussed in Chapter8. This is important in cases where the transmission of information is required to “look like” the background noise of a system through which information is transmitted. This method involves generating fractal signals in which two fractal dimensions are used to differentiate between a zero bit and a non-zero bit and would in practice replace the frequency modulation (and demodulation) that is currently used in digital communications systems. The basic steps involved are given below for completeness.

[0528] (viii) For bit=0 chose a minimum fractal dimension D_(min) and for to bit=1 allocated a maximum fractal dimension D_(max)

[0529] (ix) Generate a fractal signal of length N (a power of 2) for each bit in the bit stream.

[0530] (x) Concatinate all fractal associate with each bit and transmit.

[0531] 9.2 Decryption

[0532] Decryption of the transmitted fractal signal is obtained using the methods discussed in Section 7.5 to recover the fractal dimensions and thus the coded bit stream. Reconstructing the original binaray sequence from the coded bit stream is then obtained using the inverse of the steps (i)-(vii) given above. This is illustrated in an example given in the following Chapter. A simple high level data flow diagram of this method of encryption is given in FIG. 8

[0533] 10 Prototype Software System—DECFC

[0534] In this chapter, a brief summary is given of a prototype software package (Data Encryption and Camouflage using Fractal and Chaos—DECFC) that has been written to investigate the theoretical principles and algorithmic details presented so far. A detailed discussion of the systems and its software engineering is beyond the scope of this report. The system has been written primarily to research the numerical performanceof the techniques developed and as a work-bench for testing out new ideas.

[0535] 10.1 Hardware Requirements

[0536] In its present form DECFC only requires an IBM PC/AT, or a close compatible, whichis running the MS-DOS or PC-DOS operating system, version 2.0 or above. DECFC requires approximately 4M of RAM over and above the operating system requirements. If the available PC has more than this minimum hardware configuration, then it should not cause any problems. Memory is required over and above the size of this executable file for the system stack.

[0537] 10.2 Software

[0538] DECFC encrypts and decrypts input data. It is a parameter driven operating system utility, i.e. whenever DECFC is executed, it inspects the parameters passed to it and determines what action should be taken. The process of encryption uses a secret encrypted state. Secure key management is at the heart of any encryption system, and DECFC employs the best possible key management techniques that can be achieved with a symmetric encryption algorithm. Key management facilities are all accessed by activating menus available. Encryption and decryption are both performed using a commandline interpreter which can extract the chosen parameters from the DECFC command line. Encryption and decryption are, therefore, ideally suited to batch file operation.where complex file manipulations can be achieved by simply executing the appropriate batch file.

[0539] A two key management is used which contains chaotic or psuedo random encryption key and the camouflage encryption key. Two encryption keys are required for thispurpose. This process has the same effect, in cryptographic strength terms, as using adouble length encryption key. Each single decipherment is replaced by the followingprocess: (i) encipher with Chaotic or Random key; (ii) encipher with Camouflage key.

[0540] Decryption is similarly achieved using: (i) decipher with chaotic or random key; (ii) decipher with camouflage key.

[0541] The camouflage key is stored in encrypted form in a data. It is important to take particular care to ensure that this data is not available to unauthorised users.

[0542] Implementing encryption as a software package has the major advantage thatthe encryption process itself is not a constituent part of the process used to transmit the data. An encrypted message or data file can, therefore, be sent via any type ofmedium. The method of transport does notaffect the encryption. Once received, the data is decrypted using the appropriateencryption key. The original software (i.e. module library) has beendeveloped using Borland Turbo C++ (V3) compiler making extensive use of thegraphics functions available. Attempts have been made to provide clear self-commenting software.

[0543] 10.3 Command Line Switches

[0544] The various facilities available within DECFC are activated by command lineswitches. A single letter acts as a switch character to tell DECFC the type ofcommand to be invoked. Upper and lower case has no significance for the switch characters. These switches are activated by entering the single first letter directly after the prompt from numeric keys. Once activated, they remain active until changed. The command line is passed and acted upon sequentially. The function of each of the command line switch characters is explained briefly below. Main menu choices G—Generate: Generate the user required signal L—Load sig: Load the signal from the saved file Q—Quit: Quit the program\it Generate menu choices P—Parameters: Extract the signal parameters E—Encode: Execute the code menu G—Generate: Generate the encrypted signal D—Decode: Decrypt the signal B—Back: Return to the main menu Code menu choices M—Manual: Generate the manual binary code R—Random: Generate the random binary code L—Load Code: Generate the encrypted code by Random key or Chaotic key B—Back: Return back to the code menu Key menu choices: R—Random key: Create the Random key by user C1—Chaotic key: Create the Chaotic key by user C2—Camouflage key: Create the Camouflage key by user

[0545] 10.4 Windows

[0546] All the information produced by the DECFC system is contained within one of the five “windows” (boxed in areas of the screen). Each window has a designated function which is described below.

[0547] Menu Window. Menu choices are presented to the user in this window and information on the input and output binary sequences given.

[0548] Parameter Window. The fractal parameters are displayed for the user in this window. It provides information on the fractal size, fractals/bit, low fractal dimension and high fractal dimension which are either chosen be the use or given default values.

[0549] Code Window. Input binary data before and after reconstruction is displayed in this window. The reconstructed sequence is superimposed on the original code (dotted line). The original binary sequence and the estimated binary sequence are displayed with red and green lines respectively.

[0550] Signal Window. In this window, data encrypted by random numbers or chaotic numbers andcamouflage coding is displayed for analysis by the user.

[0551] Fractal Dimensions Window. In this window, original and reconstructed fractal dimensions are displayed for analysis by the user.

[0552] 10.5 Example Results

[0553] This section provides a step-by-step example of the encryption system for a simple example input.

[0554] Encryption

[0555] With the execution of the program, the first step is to enter the seed for thepsuedo random number generator which can be any positive integer. This parameter is used to generate the Gaussian white noise used for computing the fractal signals.

[0556] Input data can then be generated either by loading it from a file. In this example, we consider the input

[0557] xc

[0558] The system transforms the characters into ASCII codes

[0559] 120 99

[0560] and from the ASCII codes into a bit sequence

[0561] 0111100001100011

[0562] This bit field is then segmented into fields which consist ofbits of one kind 0 1111 0000 11 000 11

[0563] The number of field $N$ is then computed.

[0564] N=6

[0565] The number of bits in each field is then obtained ($K_(—)0,K_(—)1, . . . ,K_N$) 144232

[0566] A sequence of psuedo-random or chaotic integers ($R_(—)0, R_(—)1, . . . ,R_N$) of length $N$ is then obtained to scamble the data.

[0567] Random key=1;

[0568] 602067

[0569] These number sequences are then added together to give the$D_i=K_i+R_i,\h

[0570] i=0,1, . . . ,N$\startcode\ $\quad {{1\quad 4\quad 4\quad 2\quad 3\quad 2} + \underset{\_}{6\quad 0\quad 2\quad 0\quad 6\quad 7}}$   7  4  6  2  9  9

[0571] Each number of the resulting sequence is transformed to itsbinary equivalent

[0572] 0000111 0000100 0000110 0000010 0001001 0001001

[0573] Concatenating the resulting bit fields into a single bit stream, we obtain 000011100001000000110000001000010010001001

[0574] This encrypted data is shown graphically in FIG. 9 (CODE).

[0575] The bit stream can now be submitted to the fractal coding algorithm In this example, the default values of the fractal parameters are used (these values represent the fractal coding key).

[0576] Fractal size=64 Fractals

[0577] Bit=5

[0578] Low dimension=1.60

[0579] High dimension=1.90

[0580] In this case, five fractal signals, each of length 64 for each bit are computed and concatinated. This provide the fractal signal shown in the fractal window of FIG. 9.

[0581] Decryption

[0582] The information retrieval problem is solved by computing the fractal dimensionsusing the Power Spectrum Method discussed in Chapter 7 using a conventional moving window principle (Fractal Dimension Segmentation) to give the fractal dimension signature D_(i).

[0583] The binary sequence is then obtained from the following algorithm:

[0584] If D_(i)≦Δ then bit=0

[0585] If D_(i)>Δ then bit=1.

[0586] Where

Δ=D_(min)+(D_(max)−D_(min))/2

[0587] The reconstructed fractal dimensions are shown in the fractal parameters window. The estimated binary sequence is displayed in the Menu Windowon the of FIG. 9 (“Estimate:’)

[0588]FIG. 9 Example of the output of the DECFC system

[0589] This estimated binary code after reconstruction is

[0590] 000011100001000000110000001000010010001001

[0591] This bit stream is then segmented into 7 bit fields 000011 0000100 0000110 0000010 0001001 0001001

[0592] and transformed into the following integer sequence

[0593] 7 4 6 2 9 9

[0594] Regenerating the random integer sequence (using the same Random key)and subtracting them from the integer sequence above we obtain $\begin{matrix} {7\quad 4\quad 6\quad 2\quad 9\quad 9} \\ \underset{\_}{6\quad 0\quad 2\quad 0\quad 6\quad 7} \end{matrix}$ $\underset{\_}{\quad {1\quad 4\quad 4\quad 2\quad 3\quad 2}}$

[0595] Each integer is then convereted into its corresponding bit form. 0 1111 0000 11 000 11

[0596] and concatinated into the following bit pattern. 0111100001 100011 Changing this bit sequence of into decimal form we obtain theASCII codes 120 99

[0597] and finally, transforming this ASCII codes into output characters, we reconstruct the original two-character set xc.

[0598] 11 Conclusions and Recommendations For Future Work

[0599] 11.1 Conclusions

[0600] The purpose of this report has been to give an overview of encryption techniques and to discuss the uses of Fractals and Chaos in data security.

[0601] Two principal areas have been considered:

[0602] (i) The role of chaos generating algorithms for producing psuedo-random numbers.

[0603] (ii) The application of the theorey of random scaling fractal signals coding and camouflaging bit streams.

[0604] Only one chaos generationg algorithm has been considered based on the Vurhulst process in order to test out some of the ideas presented. The method of fractal signal generation and fractal dimension segmentationis also only one of fractal signal generation and fractal dimension segmentationis also only one of many numerical approaches that could be considered buthas been used effectively in this work to demonstarte the principles of fractal coding.

[0605] 11.2 Recommendations For Future Work

[0606] Data Compression

[0607] As discussed in Chapter 4, there are many binary data compression techniques, but there are three main standards. First, there is a standard applied specifically to videoconferencing called H.261 (or, sometimes, px64) and has been formulated by the European Commission's Consultative Committee on International Telephony and Telegraphy (CCITT). Second is the Joint PhographicExperts Group (JPEG) which has now effectively created a standard for compressing still images. The third is called the Motion Picture Experts Group (MPEG). As the name suggests, MPEG seeks to define a standard for coding moving images and associated audio.

[0608] While standard schemes are likely to dominate the industry, there is still roomfor others. The most important is a scheme relying on a profound level of redndancyin form and shape in the natural word. It uses an approach known as Fractal Compression.

[0609] Future research in this area of work should include the applications of different data compression schemes and their use in the encryption techniques discussed in this report which has only considered runlength coding.

[0610] Chaos Generation

[0611] The advantages and diasvantages of using a chaos generator instead of a conventional pseudo-random number generator have not yet been fully investigated. The must include a complete study of the statistics associated with chaos based random number generators. Since there is an unlimited number of possible chaos generators to choose from, it might be possible to develop an encryption scheme which is based on a random selection of different chaos generating iterators, This approach could be integrated into a key hierarchy at many different levels.

[0612] Fractal Coding

[0613] The fractal noise model used in the coding operation is consistent with many noise types but is not as general as using a Power Spectral Density Function (PSDF) of the type

[0614] $$P(\omega)=A\omega{circumflex over ( )}2g\over (\omega_(—)0{circumflex over ( )}2+omega{circumflex over ( )}2){circumflex over ( )}q$$ to describe the noise field.

[0615] Further work is now required to determine the PSDFs of different transmission noises and to quantify them in terms of the parameters q, g and ω₀. In cases where the transmission noise is dominated by a PSDF of the form ω^(2q) the fractal model used here is sufficient. The development of a coding technique based on the parameters q, g and ω₀ could provide a greater degree of flexibility and allow the noise field to be tailored to suit a wider class of data transmission systems. The value of such a scheme with regard to the extra computational effort require to develop a robust inverse solution (i.e. recover the parameters q, g and ω₀) is a matter for further research.

SECTION 4

[0616] Title: “Improvements in or Relating to Image Processing”

[0617] THIS INVENTION relates to image processing and relates, more particularly, to a method of and apparatus for deriving from a plurality of “frames” of a video “footage”, a single image of a higher visual quality than the individual frames.

[0618] Anyone who has access to a conventional analogue video tape recorder with aframe freeze facility will be aware that the visual quality of a single frame in atypical video recording is subjectively significantly inferior to thenormally viewed (moving) video image. To a significant extent, of course, the quality of the (moving) video image provided by a domestic videorecorder is already significantly lower than that provided by directconversion of a typical of a transmitted TV signal, simply because of thereduced bandwidth of the video recorder itself, but nevertheless the factthat, to the human observer, the quality of the recorded video image seemsmuch better than that of the individual recorded frames suggests that thehuman eye/brain combination is, in effect, integrating the information froma whole series of video frames to arrive at a subjectively satisfying visual impression. It is one of the objects of the present invention to provide apparatus and a method for carrying out an analogous process to arrive at a “still” image, from a section of video footage, which is of significantly better visual quality than the individual “frames” of the same video footage.

[0619] According to one aspect of the present invention there is provided a method of processing a section of video “footage” to produce a “still” view of higher visual quality than the individual frames of that footage, comprising sampling, over a plurality of video “frames”, image quantities (such as brightness and hue or colour) for corresponding points over such frames, and processing the samples to produce a high quality “still” frame.

[0620] According to another aspect of the invention there is provided apparatus for processing a section of video footage to produce a “still” view of higher visual quality than the individual frames of that footage, the apparatus comprising means for receiving data in digital form corresponding to said frames, processing means for processing such data and producing digital data corresponding to an enhanced image based on such individual frames, and means for displaying or printing said enhanced image.

[0621] In the preferred mode of carrying out the invention, the video informationis processed digitally and accordingly, except where a digitised videosignal is already available, (for example, where the video signal is adigital TV signal or a corresponding video signal or comprises videofootage which has been recorded digitally) apparatus for carrying outthe invention may comprise means, known per se, for digitising analoguevideo frames or analogue video signals, whereby, for example, each videoframe is notionally divided up into rows and columns of “pixels” and digital data derived for each pixel, such digital data representing, for example, brightness, colour, (hue), etc. The invention may utilisevarious ways of processing the resulting data. For example, in one methodin accordance with the invention, the brightness and colour data for eachof a plurality of corresponding signals in a corresponding plurality ofsuccessive video frames, for example four or five successiveframes, may simply be averaged, thereby eliminating much high-frequency “noise”, (i.e. artefacts appearing only in individual frames and whichare not carried over several frames). In a situation where the sequenceof frames concerned was a sequence with minimal camera or subject movement, the “average” frame might correspond, noise reduction apart, with the video frame in the middle of that sequence. The processing apparatus is preferably also programmed to reject individual frames which differ significantly from this average and/or to determine when an “average” frame derived as indicated is so deficient in spatial frequencies in a predetermined range as to indicate that a sequence of frames selected encompasses a “cut” from one shot to another and so on. Thus a considerable amount of “pre-processing” is possible to ensure, as far as possible, that the frames actually processed are as little different from one another in picture content as possible. The views thus processed andaveraged may also be subjected to contrast enhancement and/or boundary/edgeenhancement techniques before further processing, or the further processingmay be arranged to effect any necessary contrast enhancement as well as enhancement in other respects. Section 4 of Part 2 of this section sets out in mathematical terms the techniques and algorithms which are preferably utilised in such further processing, as does Appendix A to said Part 2. Sections 1 to 3 of Part 2 of this Section provides background to Section 4 and discloses further techniques which may be utilised. All of these techniques are, of course, preferably implemented by means of a digital computer programmedwith a program incorporating steps which implement and correspond to themathematical procedures and steps set out in Part 2 of this Section.

[0622] It will be understood that the program followed may include variousrefinements, for example, adapted to identify “mass” displacement of pixelvalues from frame to frame due to camera movement or to movement of amajor part of the field of view, such as a moving subject, relative tothe camera, to identify direction of relative movement and use thisinformation in “de-blurring” efficiently, and also to take intoaccount the (known) scanning mechanism of the video system concerned, (in the case of TV or similar videofootage). The techniques used may include increase in the pixel density of the“still” image as compared with the digitised versions of the individual video frames (a species of the image reconstruction and super resolution referred to in Part 2 of this Section). Thus, in effect, the digitised versions of the individual video frames may be re-scaled to a higher density and image quantities for the “extra” pixels obtained by a sophisticated form of interpolation of values for adjoining pixels in the lower pixel density video frames.

[0623] Whilst it is envisaged that a primary use of the invention may be in derivingvisually acceptable “stills” from electronic video material, it will be understood that similar techniques can be applied to film material on “celluloid”. Furthermore, by applying the techniques in accordance with the invention tosuccessive sequences of, say, six or seven frames in succession, with theselected sequence of five or six frames being advanced by one frame at a time,(with appropriate allowance being made, as referred to above, for “cuts”, “fades”, and like cinematic devices), the invention may be applied to, for example, the restoration of antique film stock.

PART 2

[0624] Inverse Problems and Deconvolution:

[0625] An Introduction to Image Restoration and Reconstruction

[0626] Summary

[0627] All image formation systems are inherently resolution limited. Moreover, many images are blurred due to a variety of physical effects such as motion in the object or image planes, the effects of turbulence and refraction and/or diffraction.

[0628] When an image is recorded that has been degraded in this manner, a number of digitalimage processing techniques can employed to “de-blur” the image andenhance its information content. Nearly all ofthese techniques are either directly or indirectly based on a mathematical model for the blurred imagewhich involves the convolution of two functions—the Point Spread Functionand the Object Function. Hence, “de-blurring” an image amounts to solving the inverse problem posed by this model which is known as “Deconvolution”. Image restoration attempts to provide a resolution compatible with the bandwidth of the imaging system (a resolution limited system). Image reconstruction attempt to provide a resolution that is greater than the inherent resolution of the data (i.e. the resolution limit of the imaging system). This is often known as super resolution. In addition to this general problem, there is the specific problem of reconstructing an image from a set of projections; a problem which is the basis of Computed Tomography and quantified in termsof an integral transform known as the Radon transform.

[0629] With regard to the discussion above, the aim of this document is to discuss:

[0630] (i) basic methods of solution;

[0631] (ii) essential algorithms;

[0632] (iii) some applications Notation BL Band Limited DFT Discrete Fourier Transform IDFT Inverse Discrete Fourier Transform FFT Fast Fourier Transform SNR Signal-to-Noise Ratio

2D Convolution Operation ⊙⊙ 2D Correlation Operation {circumflex over (B)} Back-Projection Operator E Entropy {circumflex over (F)}₁ 1D Fourier Transform Operator {circumflex over (F)}₁ ⁻¹ Inverse iD Fourier Transform Operator {circumflex over (F)}₂ 2D Fourier Transform Operator {circumflex over (F)}₂ ⁻² Inverse 2D Fourier Transform Operator Ĥ Hilbert Transform Operator {circumflex over (R)} Radon Transform Operator {circumflex over (R)}⁻¹ Inverse Radon Transform Operator P Projection δ 1D Delta Function δ² 2D Delta Function κ 2D Spatial Frequency Vector k_(x), k_(y) Spatial Frequency Vectors η 2D Unit Vector Γ 2D Space Vector ∀ ‘Forall’ sinc sinc function sinc(

) = sin(

)/

f_(ij) Object Function {circumflex over (f)}_(ij) Least Square Estimate of f_(ij) n_(ij) Noise Function p_(ij) Point Spread Function s_(ij) Recorded Signal/Image c_(ij) Correlation Function F_(ij) DFT of f_(ij) {circumflex over (F)}_(ij) DFT of {circumflex over (f)}_(ij) N_(ij) DFT of n_(ij) P_(ij) DFT of p_(ij) P_(i) ^(*) _(j) Complex Conjugate of P_(ij) S_(ij) DFT of s_(ij) C_(ij) DFT of c_(ij)

[0633] 1. Introduction

[0634] The field of information science has brought about some of the most dramatic and important scientific development of the past twenty years. This has been due primarily to the massive increase in the power and availability of digital computers. One area of information technology which has grown rapidly as a result of this, has been digital signal and image processing. This subject has become increasingly important because of the growing demand to obtain information about the structure, composition and behaviour of objects without having to inspect them invasively. Deconvolution is a particularly important subject area in signal and image processing. In general, this problem is concerned with the restoration and/or reconstruction of information from known data and depends critically on a priori knowledge on the way in which the data (digital image for example) has been generated and recorded. Mathematically, the data obtained are usually related to some ‘Object Function’ via an integral transform. In this sense, deconvolution is concerned with inverting certain classes of integral equation—the convolution equation. In general, there is no exact or unique solution to the image restoration/reconstruction problem—it is an ill-posed problem. We attempt to find a ‘best estimate’ based on some physically viable criterion subject to certain conditions.

[0635] The fundamental imaging equation is given by

s=p

ƒ+n

[0636] where s, p, ƒ and n are the image, the Point Spread Function (PSF), the Object Function and the noise respectively. The symbol

denotes 2D convolution. The imaging equation is a stationary model for the image s in which the (blurring) effect of the PSF at any location in the ‘object plane’ is the same. Using the convolution theorem we can write this equation in the form

S=PF+N

[0637] where S, P, F and N are the (2D) Fourier transforms of s, p, ƒ and n respectively. Assuming that F is a broadband spectrum, there are two cases we should consider:

[0638] (i) P(k_(x), k_(y))→0 as (k_(x), k_(y))→∞ where k_(x) and k_(y) are the spatial frequencies in the x and y directions respectively. The image restoration problem can then be stated as ‘recover F given S’.

[0639] (ii) P(k_(x), k_(y)) is bandlimited, i.e. P(k_(x), k_(y))=0 for certain values of k_(x) and/or k_(y). The image reconstruction problem can then be stated as ‘given S reconstruct F’. This typically requires the frequency components to be ‘synthesized’ beyond the bandwidth of the data. This is a (spectral) extrapolation problem

[0640] The image restoration problem can typically involve finding a solution for ƒ given that s=p

ƒ+n where p is a Gaussian PSF given by (ignoring scaling)

p(x, y)=exp[−(x ² +y ²)/σ²]

[0641] (σ being the standard deviation) which has a spectrum of the form (ignoring scaling)

P(k _(x) , k _(y))=exp[−σ²(k _(x) ² +k _(y) ²)]

[0642] This PSF is a piecewise continuous function as is its spectrum.

[0643] An example of an image reconstruction problem is ‘find ƒ given that s=p

ƒ+n’ where p is given by (ignoring scaling)

p(x, y)=sinc(αx) sinc(βy)

[0644] This PSF has a spectrum of the form (ignoring scaling)

P(k _(x) , k _(y))=H _(α)(k _(x))H _(β)(k _(y))

[0645] where ${H_{\alpha}\left( k_{x} \right)} = \left\{ {{\begin{matrix} 1 & {{k_{x}} \leq \alpha} \\ 0 & {{k_{x}} > \alpha} \end{matrix}\quad {H_{\beta}\left( k_{y} \right)}} = \left\{ \begin{matrix} 1 & {{k_{y}} \leq \beta} \\ 0 & {{k_{y}} > \beta} \end{matrix} \right.} \right.$

[0646] It is a piecewise continuous function but its spectrum is discontinuous, the bandwidth of p

ƒ of being given by α in the x direction and β in the y direction.

[0647] 2. Restoration of Blurred Images

[0648] To put the problem in perspective, consider the discrete case, i.e.

s _(ij) =p _(ij)

ƒ_(ij) +n _(ij)

[0649] where s_(ij) is a digital image. Suppose we neglect the term n_(ij), then

s_(ij)=p_(ij)

ƒ_(ij)

[0650] or by the (discrete) convolution theorem

S_(ij)=P_(ij)F_(ij)

[0651] where S_(ij), P_(ij) and F_(ij) and the DFTs of s_(ij), p_(ij) and ƒ_(ij) respectively. Clearly, $F_{ij} = \frac{S_{ij}}{P_{ij}}$

[0652] and therefore $f_{ij} = {{IDTF}\left( \frac{S_{ij}}{P_{ij}} \right)}$ $\frac{1}{P_{ij}} = \frac{P_{ij}^{*}}{{P_{ij}}^{2}}$

[0653] which is called the Inverse Filter.

[0654] Suppose, we were to implement this result on a digital computer; if P_(ij) approached zero (in practice a very small number) for any value of i and/or j then depending on the compiler, the computer would respond with an output such as ‘ . . . arithmetic fault . . . divide by zero’. A simple solution would be to regularize the result, i.e. use $f_{ij} = {{IDTF}\left( \frac{P_{ij}^{*}S_{ij}}{{P_{ij}}^{2} + {constant}} \right)}$

[0655] and ‘play around’ with the value of the constant until ‘something sensible’ was obtained which in turn would depend on the a priori information available on the form and support of ƒ_(ij). The regularization of the inverse filter is the basis for some of the methods which are discussed in these notes. We start by considering the criterion associated with the inverse filter.

[0656] 2.1 The Inverse Filter

[0657] The criterion for the inverse filter is that the mean square of the noise is a minimum. Since

s _(ij) =p _(ij)

ƒ_(ij) +n _(ij)

[0658] we can write

n _(ij) =s _(ij) −p _(ij)

ƒ_(ij)

[0659] and therefore

e=∥n _(ij)∥² =∥s _(ij) −p _(ij)

ƒ_(ij)∥²

[0660] where ${x_{ij}} \equiv \left( {\sum\limits_{i}^{\quad}{\sum\limits_{j}^{\quad}x_{ij}^{2}}} \right)^{\frac{1}{2}}$

[0661] For the noise to be a minimum, we require $\frac{\partial e}{\partial f_{ij}} = 0$

[0662] Differentiating (see Appendix A), we obtain

(s _(ij) −p _(ij)

ƒ_(ij))⊙⊙p _(ij)=0

[0663] Using the convolution and correlation theorems, in Fourier space, this equation becomes

(S _(ij) −P _(ij) F _(ij))P* _(ij)=0

[0664] Hence, solving for F_(ij) we obtain the result $F_{ij} = {\frac{P_{ij}^{*}}{{P_{ij}}^{2}}S_{ij}}$

[0665] The inverse filter is therefore given by ${{Inverse}\quad {Filter}} = \frac{P_{ij}^{*}}{{P_{ij}}^{2}}$

[0666] In principle, the inverse filter provides an exact solution to the problem when the noise term n_(ij) can be neglected. However, in practice, this solution is fraught with difficulties. First, the inverse filter is invariably a singular function. Equally bad, is the fact that even if the inverse filter is not singular, it is usually ill conditioned. This is where the magnitude of P_(ij) goes to zero so quickly as (i, j) increases, that 1/ |P_(ij)|² rapidly acquires extremely large values. The effect of this is to amplify the (usually) noisy high frequency components of S_(ij). This can lead to a restoration ƒ_(ij) which is dominated by the noise in s_(ij). The inverse filter can therefore only be used when:

[0667] (i) The filter is non-singular.

[0668] (ii) The SNR of the data is very large (i.e. ∥p_(ij)

ƒ_(ij) ∥>>∥n _(ij)∥).

[0669] Such conditions are rare. A notable exception occurs in Computed Tomography which is covered in Section 5 of these notes in which the inverse filter associated with the ‘Back-Project and Deconvolution’ algorithm is non-singular.

[0670] The computational problems that arise from implementing the inverse filter can be avoided by using a variety of different filters whose individual properties and characteristics are suited to certain types of data. One of the most commonly used filters for image restoration is the Wiener filter which is considered next.

[0671] 2.2 The Wiener Filter

[0672] An algorithm shall be derived for deconvolving images that have been blurred by some lowpass filtering process and corrupted by additive noise. In mathematical terms, given the imaging equation

s _(ij) =p _(ij)

ƒ_(ij) +n _(ij)   (2.1)

[0673] the problem is to solve for ƒ_(ij) given s_(ij), p_(ij) and some knowledge of the SNR. This problem is solved using the least squares principle which provides a filter known as the Wiener filter.

[0674] The Wiener filter is based on considering an estimate {circumflex over (ƒ)}_(ij) for ƒ_(ij) of the form

{circumflex over (ƒ)}_(ij)=q_(ij)

s_(ij)   (2.2)

[0675] Given this model, our problem is reduced to computing q_(ij) or equivalently its Fourier transform Q_(ij). To do this, we make use of the error $\begin{matrix} {e = {{{f_{ij} - {\hat{f}}_{ij}}}^{2} \equiv {\sum\limits_{i}^{\quad}\quad {\sum\limits_{j}^{\quad}\left( {f_{ij} - {\hat{f}}_{ij}} \right)^{2}}}}} & (2.3) \end{matrix}$

[0676] and find q_(ij) such that e is a minimum, i.e. $\frac{\partial e}{\partial q_{ij}} = 0$

[0677] Substituting equation (2.2) into equation (2.3) and differentiating, we get $\frac{\partial e}{\partial q_{kl}} = {{{- 2}{\sum\limits_{i}^{\quad}\quad {\sum\limits_{j}^{\quad}{\left( {f_{ij} - {\sum\limits_{n}^{\quad}{\sum\limits_{m}^{\quad}{s_{{i - n},{j - m}}q_{n\quad m}}}}} \right)\frac{\partial}{\partial q_{kl}}{\sum\limits_{n}^{\quad}\quad {\sum\limits_{m}^{\quad}{s_{{i - n},{j - m}}q_{n\quad m}}}}}}}} = {{{- 2}{\sum\limits_{i}^{\quad}\quad {\sum\limits_{j}^{\quad}{\left( {f_{ij} - {\sum\limits_{n}^{\quad}\quad {\sum\limits_{m}^{\quad}{s_{{i - n},{j - m}}q_{n\quad m}}}}} \right)s_{{i - k},{j - l}}}}}} = 0}}$

[0678] Rearranging, we have ${\sum\limits_{i}^{\quad}\quad {\sum\limits_{j}^{\quad}{f_{ij}s_{{i - k},{j - l}}}}} = {\sum\limits_{i}^{\quad}\quad {\sum\limits_{j}^{\quad}{\left( {\sum\limits_{n}^{\quad}\quad {\sum\limits_{m}^{\quad}{s_{{i - n},{j - m}}q_{n\quad m}}}} \right)s_{{i - k},{j - l}}}}}$

[0679] The left hand side of the above equation is a discrete correlation of ƒ_(ij) with s_(ij) and the right hand side is a correlation of s_(ij) with the convolution $\sum\limits_{n}^{\quad}\quad {\sum\limits_{m}^{\quad}{s_{{i - n},{j - m}}q_{n\quad m}}}$

[0680] Using operator notation, it is convenient to write this equation in the form

ƒ_(ij) ⊙⊙s _(ij)=(q _(ij)

s _(ij))⊙⊙s _(ij)

[0681] Moreover, using the correlation and convolution theorems, the equation above can be written in Fourier space as

F_(ij)S*_(ij)=Q_(ij)S_(ij)S*_(ij)

[0682] which, after rearranging gives $Q_{ij} = \frac{S_{ij}^{*}F_{ij}}{{S_{ij}}^{2}}$

[0683] Now, in Fourier space, equation (2.1) becomes

S _(ij) =P _(ij) F _(ij) +N _(ij)

[0684] Using this result, we have S_(ij)^(*)F_(ij) = (P_(ij)F_(ij) + N_(ij))^(*)F_(ij) = P_(ij)^(*)F_(ij)² + N_(ij)^(*)F_(ij)

[0685] and S_(ij)^(*)² = S_(ij)S_(ij)^(*) = (P_(ij)F_(ij) + N_(ij))(P_(ij)F_(ij) + N_(ij))^(*) = P_(ij)²F_(ij)² + P_(ij)F_(ij)N_(ij)^(*) + N_(ij)P_(ij)^(*)F_(ij)^(*) + N_(ij)²

[0686] Hence, the filter Q_(ij) can be written in the form $Q_{ij} = \frac{{P_{ij}^{*}{F_{ij}}^{2}} + {N_{ij}^{*}F_{ij}}}{{{P_{ij}}^{2}{F_{ij}}^{2}} + D_{ij} + {N_{ij}}^{2}}$

[0687] where

D _(ij) =P _(ij) F _(ij) N* _(ij) +N _(ij) P* _(ij) F* _(ij)

[0688] Signal Independent Noise

[0689] This result can be simplified further by imposing a condition which is physically valid in the large majority of cases. The condition is that ƒ_(ij) and n_(ij) are uncorrelated, i.e.

ƒ_(ij)⊙⊙n_(ij)=0

[0690] and

n_(ij)⊙⊙ƒ_(ij)=0

[0691] In this case, the noise is said to be ‘signal independent’ and it follows from the correlation theorem that

F_(ij)N*_(ij)=0

[0692] and

N_(ij)F*_(ij)=0

[0693] This result allows us to cancel the cross terms present in the last expression for Q_(ij) (i.e. set D_(ij)=0 and N*_(ij) F _(ij)=0) leaving the formula $Q_{ij} = \frac{P_{ij}^{*}{F_{ij}}^{2}}{{{P_{ij}}^{2}{F_{ij}}^{2}} + {N_{ij}}^{2}}$

[0694] Finally, rearranging, we obtain the expression for the least squares or Wiener filter, $Q_{ij} = \frac{P_{ij}^{*}}{{P_{ij}}^{2} + {{N_{ij}}^{2}/{F_{ij}}^{2}}}$

[0695] Estimation of the Noise-to-Signal Power Ratio |F_(ij)|² / |N_(ij)|²

[0696] From the algebraic form of the Wiener Filter derived above, it is clear that this particular filter depends on:

[0697] (i) the functional form of the PSF P_(ij) that is used;

[0698] (ii) the functional form of |N_(ij)|² / |F_(ij)|².

[0699] The PSF of the system can usually be found by literally imaging a single point source which leaves us with the problem of estimating the noise-to-signal power ratio |N_(ij)|² / |F_(ij)|². This problem can be solved if one has access to two successive images recorded under identical conditions.

[0700] Consider two digital images denoted by s_(ij) and s′_(ij) of the same object function ƒ_(ij) recorded using the same PSF p_(ij) (i.e. imaging system) but at different times and hence with different noise fields n_(ij) and n′_(ij). These images are given by

s _(ij) =p _(ij)

ƒ_(ij) +n _(ij)

[0701] and

s′ _(ij) =p _(ij)

ƒ_(ij) +n′ _(ij)

[0702] respectively where the noise functions are uncorrelated and signal independent, i.e.

n_(ij)⊙⊙n′_(ij)=0   (2.4)

ƒ_(ij)⊙n_(ij)=n_(ij)⊙⊙ƒ_(ij)=0   (2.5)

[0703] and

ƒ_(ij)⊙⊙n′_(ij)=n′_(ij)⊙⊙ƒ_(ij)=0   (2.6)

[0704] We now proceed to compute the autocorrelation function of s_(ij) given by

c_(ij)=s_(ij)⊙⊙s_(ij)

[0705] Using the correlation theorem and employing equation (2.5) we get $\begin{matrix} {C_{ij} = {{S_{ij}S_{ij}^{*}} = {\left( {{P_{ij}F_{ij}} + N_{ij}} \right)\left( {{P_{ij}F_{ij}} + N_{ij}} \right)^{*}}}} \\ {= {{{P_{ij}}^{2}{F_{ij}}^{2}} + {N_{ij}}^{2}}} \end{matrix}$

[0706] where C_(ij) is the DFT of c_(ij). Next, we correlate s_(ij) with s′_(ij) giving the cross-correlation function

c′_(ij)=s_(ij)⊙⊙s′_(ij)

[0707] Using the correlation theorem again and this time, employing equations (2.4) and (2.6) we get $\begin{matrix} {C_{ij}^{\prime} = {{{P_{ij}}^{2}{F_{ij}}^{2}} + {P_{ij}F_{ij}N_{ij}^{\prime*}} + {N_{ij}P_{ij}^{*}F_{ij}^{*}} + {N_{ij}N_{ij}^{\prime*}}}} \\ {= {{P_{ij}}^{2}{F_{ij}}^{2}}} \end{matrix}$

[0708] The noise-to-signal ratio can now be obtained by dividing C_(ij) by C′_(ij) giving $\frac{C_{ij}}{C_{ij}^{\prime}} = {1 + \frac{{N_{ij}}^{2}}{{P_{ij}}^{2}{F_{ij}}^{2}}}$

[0709] Re-arranging, we obtain the result $\frac{{N_{ij}}^{2}}{{F_{ij}}^{2}} = {\left( {\frac{C_{ij}}{C_{ij}^{\prime}} - 1} \right){P_{ij}}^{2}}$

[0710] Note that both C_(ij) and C′_(ij) can be obtained from the available data s_(ij) and s′_(ij). Also, substituting this result into the formula for Q_(ij), we obtain an expression for the Wiener filter in terms of C_(ij) and C′_(ij) given by $Q_{ij} = {\frac{P_{ij}^{*}}{{P_{ij}}^{2}}\frac{C_{ij}^{\prime}}{C_{ij}}}$

[0711] In those cases where the user has access to successive recordings, the method of computing the noise-to-signal power ratio described above can be employed. The problem is that in many practical cases, one does not have access to successive images and hence, the cross-correlation function c′_(ij) cannot be computed. In this case, one is forced to make an approximation and consider a Wiener filter of the form ${{Wiener}\quad {Filter}} = \frac{P_{ij}^{*}}{{P_{ij}}^{2} + {constant}}$

[0712] The constant ideally reflects any available information on the average signal-to-noise ratio of the image. Typically, we consider an expression of the form ${constant} = \frac{1}{({SNR})^{2}}$

[0713] where SNR stands for Signal-to-Noise Ratio. In practice, the exact value of this constant must be chosen by the user.

[0714] Before attempting to deconvolve an image the user must at least have some a priori knowledge on the functional form of the Point Spread Function. Absence of this information leads to a method of approach known as ‘Blind Deconvolution’. A common technique is to assume that the Point Spread Function is Gaussian, i.e.

p _(ij)=exp[−(i ² +j ²)/2σ²]

[0715] where σ is the standard deviation which must be defined by the user. In this case, the user has control of two parameters:

[0716] (i) the standard deviation of the Gaussian PSF;

[0717] (ii) the SNR.

[0718] In practice, the user must adjust these parameters until a suitable ‘user optimized’ reconstruction is obtained. In other words, the Wiener filter must be ‘tuned’ to give a result which is acceptable based on the judgement and intuition of the user. This interactive approach to image restoration is just one of many practical problems associated with deconvolution which should ideally be executed in real time.

[0719] 2.3 The Power Spectrum Equalization Filter

[0720] As the name implies, the Power Spectrum Equalization (PSE) filter is based on finding an estimate {circumflex over (ƒ)}_(ij) whose power spectrum is equal to the power spectrum of the desired function ƒ_(ij). In other words, {circumflex over (ƒ)}_(ij) is obtained by employing the criterion

|F _(ij)|² |{circumflex over (F)} _(ij)|²

[0721] together with the linear convolution model

{circumflex over (ƒ)}_(ij)=q_(ij)

s_(ij)

[0722] Like the Wiener filter, the PSE filter also assumes that the noise is signal independent. Since

{circumflex over (F)} _(ij) =Q _(ij) S _(ij) =Q _(ij)(P _(ij) F _(ij) +N _(ij))

[0723] and given that N*_(ij)F_(ij)=0 and F*_(ij)N_(ij)=0, we have

|{circumflex over (F)} _(ij)|² ={circumflex over (F)} _(ij) {circumflex over (F)}* _(ij) =|Q _(ij)|²(|P _(ij)|² |F _(ij)|² +|N _(ij)|²)

[0724] Using the PSE criterion and solving for |Q_(ij)|, we obtain ${Q_{ij}} = \left( \frac{1}{{P_{ij}}^{2} + {{N_{ij}}^{2}/{F_{ij}}^{2}}} \right)^{\frac{1}{2}}$

[0725] In the absence of accurate estimates for the noise to signal power ratio, we approximate the PSE filter by ${{PSE}\quad {filter}} = \left( \frac{1}{{P_{ij}}^{2} + {constant}} \right)^{\frac{1}{2}}$

[0726] where ${constant} = \frac{1}{({SNR})^{2}}$

[0727] Note that the criterion used to derive this filter can be written in the form ${\sum\limits_{i}{\sum\limits_{j}\left( {{F_{ij}}^{2} - {{\hat{F}}_{ij}}^{2}} \right)}} = 0$

[0728] or using Parseval's theorem ${\sum\limits_{i}{\sum\limits_{j}\left( {{f_{ij}}^{2} - {{\hat{f}}_{ij}}^{2}} \right)}} = 0$

[0729] Compare this criterion with that use for the Wiener filter, i.e. ${Minimise}\quad {\sum\limits_{i}{\sum\limits_{j}\left( {f_{ij} - {\hat{f}}_{ij}} \right)^{2}}}$

[0730] 2.4 The Matched Filter

[0731] Matched filtering is based on correlating the image s_(ij) with the complex conjugate of the PSF p_(ij). The estimate {circumflex over (ƒ)}_(ij) of ƒ_(ij) can therefore be written as

{circumflex over (ƒ)}_(ij)=p*_(ij)⊙⊙s_(ij)

[0732] Assuming that n_(ij)=0, so that

s_(ij)=p_(ij)

ƒ_(ij)

[0733] we have

{circumflex over (ƒ)}_(ij)p*_(ij)

p_(ij)⊙⊙ƒ_(ij)

[0734] which in Fourier space is

{circumflex over (F)} _(ij) =|P _(ij)|² F _(ij)

[0735] Observe, that the amplitude spectrum of {circumflex over (F)}_(ij) is given by |P_(ij)|²|F_(ij)| and that the phase information is determined by F_(ij) alone.

[0736] Criterion For the Matched Filter

[0737] The criterion for the matched filter is as follows. Given that

s _(ij) =p _(ij)

ƒ_(ij) +n _(ij)

[0738] the match filter provides an estimate for ƒ_(ij) of the form

{circumflex over (ƒ)}_(ij=q) _(ij)

s_(ij)

[0739] where q_(ij) is chosen in such a way that the ratio $R = \frac{{{\sum\limits_{i}{\sum\limits_{j}{Q_{ij}P_{ij}}}}}^{2}}{\sum\limits_{i}{\sum\limits_{j}{{N_{ij}}^{2}{Q_{ij}}^{2}}}}$

[0740] is a maximum.

[0741] The matched filter Q_(ij) is found by first writing ${Q_{ij}P_{ij}} = {{N_{ij}}Q_{ij} \times \frac{P_{ij}}{N_{ij}}}$

[0742] and then using the inequality $\begin{matrix} {{{\sum\limits_{i}{\sum\limits_{j}{Q_{ij}P_{ij}}}}}^{2} = {{\sum\limits_{i}{\sum\limits_{j}{{N_{ij}}\quad Q_{ij}\frac{P_{ij}}{N_{ij}}}}}}^{2}} \\ {\leq {\sum\limits_{i}{\sum\limits_{j}{{N_{ij}}^{2}\quad {Q_{ij}}^{2}\quad {\sum\limits_{i}{\sum\limits_{j}\frac{{P_{ij}}^{2}}{{N_{ij}}^{2}}}}}}}} \end{matrix}$

[0743] From this result and the definition of R given above we get $R \leq {\sum\limits_{i}{\sum\limits_{j}\frac{{P_{ij}}^{2}}{{N_{ij}}^{2}}}}$

[0744] Now, recall that the criterion for the matched filter is that R is a maximum. If this is the case, then $R = {\sum\limits_{i}{\sum\limits_{j}\frac{{P_{ij}}^{2}}{{N_{ij}}^{2}}}}$

[0745] or ${{\sum\limits_{i}{\sum\limits_{j}{{N_{ij}}\quad Q_{ij}\frac{P_{ij}}{N_{ij}}}}}}^{2} = {\sum\limits_{i}{\sum\limits_{j}{{N_{ij}}^{2}\quad {Q_{ij}}^{2}\quad {\sum\limits_{i}{\sum\limits_{j}\frac{{P_{ij}}^{2}}{{N_{ij}}^{2}}}}}}}$

[0746] This is true, if and only if ${{N_{ij}}\quad Q_{ij}} = \frac{P_{ij}^{*}}{N_{ij}}$

[0747] because we then have ${\quad {\sum\limits_{i}{\sum\limits_{j}\frac{{P_{ij}}^{2}}{{N_{ij}}^{2}}}}}^{2} = {\sum\limits_{i}{\sum\limits_{j}{\frac{{P_{ij}}^{2}}{{N_{ij}}^{2}}\quad {\sum\limits_{i}{\sum\limits_{j}\frac{{P_{ij}}^{2}}{{N_{ij}}^{2}}}}}}}$

[0748] Thus, R is a maximum when $Q_{ij} = \frac{P_{ij}^{*}}{{N_{ij}}^{2}}$

[0749] The Matched Filter for White Noise

[0750] If the noise n_(ij) is white, then its power spectrum is can be assumed to be a constant, i.e. N_(ij)² = N₀²

[0751] In this case $Q_{ij} = \frac{P_{ij}^{*}}{N_{0}^{2}}$

[0752] and ${\hat{F}}_{ij} = {\frac{P_{ij}^{*}}{N_{0}^{2}}S_{ij}}$

[0753] Hence, for white noise, the match filter provides an estimate which may be written in the form ${\hat{f}}_{ij} = {\frac{1}{N_{0}^{2}}{p_{ij}^{*} \odot {\odot s_{ij}}}}$

[0754] Deconvolution of Linear Frequency Modulated PSFs

[0755] The matched filter is frequently used in coherent imaging systems whose PSF is characterized by a linear frequency modulated response. Two well known examples are Synthetic Aperture Radar and imaging systems that use (Fresnel) zone plates. In this section, we shall consider a separable linear FM PSF and also switch to a continuous noise free functional form which makes the analysis easier. Thus, consider the case when the PSF is given by

p(x, y)=exp(iαx ²) exp(iβy ²); |x|≦X, |y|≦Y

[0756] where α and β are constants and X and Y determine the spatial support of the PSF. The phase of this PSF (in the x-direction say) is αx² and the instantaneous frequency is given by ${\frac{}{x}\left( {\alpha \quad x^{2}} \right)} = {2\quad \alpha \quad x}$

[0757] which varies linearly with x. Hence, the frequency modulations (in both x and y) are linear which is why the PSF is referred to as a linear FM PSF. In this case, the image that is obtained is given by (neglecting additive noise)

s(x, y)=exp(iαx ²) exp(iβy ²)

ƒ(x, y); |x|≦X, |y|≦Y

[0758] Matched filtering, we get

{circumflex over (ƒ)}(x, y)=exp(−iαx ²) exp(−iβy ²)⊙⊙exp(iαx ²) exp(iβy ²)

ƒ(x, y)

[0759] Now, $\begin{matrix} {{{\exp \left( {{- }\quad \alpha \quad x^{2}} \right)} \odot {\exp \left( {{\alpha}\quad x^{2}} \right)}} = {\int_{{- X}/2}^{X/2}{{\exp \left\lbrack {{- }\quad {\alpha \left( {z + x} \right)}^{2}} \right\rbrack}{\exp \left( {\quad \alpha \quad z^{2}} \right)}{z}}}} \\ {= {{\exp \left( {{- }\quad \alpha \quad x^{2}} \right)}\quad {\int_{{- X}/2}^{X/2}{\exp \quad \left( {2\quad \alpha \quad {zx}} \right){z}}}}} \end{matrix}$

[0760] Evaluating the integral over z, we have

exp(−iαx ²)⊙exp(iαx ²)=X exp(−iαx ²) sinc(αXx)

[0761] Since the evaluation of the correlation integral over y is identical, we can write

{circumflex over (ƒ)}(x, y)=XY exp(−iαx ²) exp(−iβy ²) sinc(αXx) sinc(βYy)

ƒ(x, y)

[0762] In many systems the spatial support of the linear FM PSF is relatively long. In this case,

cos(αx²) sinc(αXx)≃sinc(αXx), cos(βy²) sinc(βYy)≃sinc(βYy)

[0763] and

sin(αx²) sinc(αXx)≃0, sin(βy²) sinc(βYy)≃0

[0764] and so

{circumflex over (ƒ)}(x, y)≃XY sinc(αXx) sinc(βYy)

ƒ(x, y)

[0765] In Fourier space, this last equation can be written as ${\hat{F}\left( {k_{x},k_{y}} \right)} = \left\{ \begin{matrix} {{\frac{\pi^{2}}{\alpha \quad \beta}{F\left( {k_{x},k_{y}} \right)}},} & {{{k} \leq {\alpha \quad X}},{{{k} \leq {\beta \quad Y}};}} \\ {0,} & {otherwise} \end{matrix} \right.$

[0766] The estimate {circumflex over (ƒ)} is therefore a band limited estimate of ƒ whose bandwidth is determined by the product of the parameters α and β with the spatial supports X and Y respectively. Note, that the larger the values of αX and βY, the greater the bandwidth of the reconstruction.

[0767] 2.5 Constrained Deconvolution

[0768] Constrained deconvolution provides a filter which gives the user additional control over the deconvolution process. This method is based on minimizing a linear operation on the object ƒ_(ij) of the form g_(ij)

ƒ_(ij) subject to some other constraint. Using the least squares approach, we find an estimate for ƒ_(ij) by minimizing ∥g_(ij)

ƒ_(ij)∥² subject to the constraint

∥s _(ij) −p _(ij)

ƒ_(ij)∥² =∥n _(ij)∥²

[0769] where ${x_{ij}}^{2} \equiv {\sum\limits_{i}{\sum\limits_{j}x_{ij}^{2}}}$

[0770] Using this result, we can write

∥g _(ij)

ƒ_(ij)∥² =∥g _(ij)

ƒ_(ij)∥²+λ(∥s _(ij) −p _(ij)

ƒ_(ij)∥² −∥n _(ij)∥²)

[0771] because the quantity inside the brackets on the right hand side is zero. The constant λ is called the Lagrange multiplier. Using the orthogonality principle (see Appendix A), ∥g_(ij)

ƒ_(ij)∥² is a minimum when

(g _(ij)

ƒ_(ij))⊙⊙g _(ij)−λ(s _(ij) −p _(ij)

ƒ_(ij))⊙⊙p _(ij)=0

[0772] In Fourier space, this equation becomes

|G _(ij)|² F _(ij)−λ(S _(ij) P* _(ij) −|P _(ij)|² F _(ij))=0

[0773] Solving for F_(ij), we get $F_{ij} = \frac{S_{ij}P_{ij}^{*}}{{P_{ij}}^{2} + {\gamma \quad {G_{ij}}^{2}}}$

[0774] where γ is the reciprocal of the Lagrange multiplier (=1/λ). Hence, the constrained least squares filter is given by ${{Constrained}\quad {Least}\quad {Squares}\quad {Filter}} = \frac{P_{ij}^{*}}{{P_{ij}}^{2} + {\gamma \quad {G_{ij}}^{2}}}$

[0775] The important point about this filter is that it allows the user to change G_(ij) to suite a particular application. This filter can be thought of as a generalization of the other filters. For example, if γ=0 then the inverse filter is obtained, if γ=1 and |G_(ij)|²=|N_(ij)|² / |F_(ij)|² then the Wiener filter is obtained, and if γ=1 and |G_(ij)|²=|N_(ij)|²−|P_(ij)|² then the matched filter it obtained.

[0776] The following table lists the filters discussed so far. In each case, the filter Q_(ij) provides a solution to the inversion of the following equation

s _(ij) =p _(ij)

ƒ_(ij) +n _(ij),

[0777] the solution for ƒ_(ij) being given by

ƒ_(ij)=IDFT[Q_(ij)S_(ij)]

[0778] where IDFT stands for the 2D Discrete Inverse Fourier Transform and S_(ij) is the DFT of the digital image s_(ij). In all cases, the DFT and IDFT can be computed using a FFT. Name of Filter Formula Condition(s) Inverse Filter Q_(ij) = P_(ij)^(*)/P_(ij)|²

Minimize  n_(ij)

Wiener Filter $Q_{ij} = \frac{P_{ij}^{*}}{{P_{ij}}^{2} + {{F_{ij}}^{2}/{N_{ij}}^{2}}}$

$\begin{matrix} {{{Minimize}\quad {{f_{ij} - {q_{ij} \otimes \quad {\otimes s_{ij}}}}}^{2}};} \\ {{{N_{ij}^{*}F_{ij}} = 0},{{F_{ij}^{*}N_{ij}} = 0}} \end{matrix}\quad$

PSE Filter $Q_{ij} = \left( \frac{1}{{P_{ij}}^{2} + {{F_{ij}}^{2}/{N_{ij}}^{2}}} \right)^{\frac{1}{2}}$

$\begin{matrix} {{{F_{ij}}^{2} = {{Q_{ij}S_{ij}}}^{2}};} \\ {{{N_{ij}^{*}F_{ij}} = 0},{{F_{ij}^{*}N_{ij}} = 0}} \end{matrix}\quad$

Matched Filter Q_(ij) = P_(ij)^(*)/N_(ij)²

${Maximize}\quad \frac{{{\sum\limits_{i}{\sum\limits_{j}{Q_{ij}P_{ij}}}}}^{2}}{\sum\limits_{i}{\sum\limits_{j}{{N_{ij}}^{2}{Q_{ij}}^{2}}}}$

Constrained Filter $Q_{ij} = \frac{P_{ij}^{*}}{{P_{ij}}^{2} + {\gamma {G_{ij}}^{2}}}$

Minimize  g_(ij)⊗  ⊗  f_(ij)²

[0779] 2.5 Maximum Entropy Deconvolution

[0780] As before, we are interested in solving the imaging equation

s _(ij) =p _(ij)

ƒ_(ij) +n _(ij)

[0781] for the object ƒ_(ij). Instead of using a least squares error to constrain the solution for ƒ_(ij), we choose to find ƒ_(ij) such that the entropy E, given by $E = {- {\sum\limits_{i}{\sum\limits_{j}{f_{ij}\quad \ln \quad f_{ij}}}}}$

[0782] is a maximum. Note, that because the ln function is used in defining the Entropy, the Maximum Entropy Method (MEM) must be restricted to cases where ƒ_(ij) is real and positive.

[0783] From the imaging equation above, we can write ${s_{ij} - {\sum\limits_{n}{\sum\limits_{m}{p_{{i - n},{j - m}}f_{n\quad m}}}}} = n_{ij}$

[0784] where we have just written the convolution operation out in full. Squaring both sides and summing over i and j we can write ${{\sum\limits_{i}{\sum\limits_{j}\left( {s_{ij} - {\sum\limits_{n}{\sum\limits_{m}{p_{{i - n},{j - m}}f_{n\quad m}}}}} \right)^{2}}} - {\sum\limits_{i}{\sum\limits_{j}n_{ij}^{2}}}} = 0$

[0785] But this equation is true for any constant λ multiplying both terms on the left hand side. We can therefore write the equation for E as $\begin{matrix} {E = {{- {\sum\limits_{i}{\sum\limits_{j}{f_{ij}\ln \quad f_{ij}}}}} + {\lambda\left\lbrack {{\sum\limits_{i}{\sum\limits_{j}\left( {s_{ij} - {\sum\limits_{n}{\sum\limits_{m}{p_{{n - i},{m - j}}f_{n\quad m}}}}} \right)^{2}}} -} \right.}}} \\ \left. {\sum\limits_{i}{\sum\limits_{j}n_{ij}^{2}}} \right\rbrack \end{matrix}$

[0786] because the second term on the right hand side is zero anyway (for all values of the Lagrange multiplier λ). Given this equation, our problem is to find ƒ_(ij) such that the entropy E is a maximum, i.e. $\frac{\partial E}{\partial f_{ij}} = 0$

[0787] Differentiating (an exercise which will be left to the reader), and switching to the notation for 2D convolution

and 2D correlation ⊙⊙, we find that E is a maximum when

1+ln ƒ_(ij)−2λ(s _(ij) ⊙⊙p _(ij) −p _(ij)

ƒ_(ij) ⊙⊙p _(ij))=0

[0788] or, after rearranging,

ƒ_(ij)=exp[−1+2λ(s _(ij) ⊙⊙p _(ij) −p _(ij)

ƒ_(ij) ⊙⊙p _(ij))]

[0789] This equation is transcendental in ƒ_(ij) and as such, requires that ƒ_(ij) is evaluated iteratively, i.e. f_(ij)^(k + 1) = exp [−1 + 2λ(s_(ij) ⊙ ⊙p_(ij) − p_(ij) ⊗ ⊗f_(ij)^(k) ⊙ ⊙p_(ij))]; k = 0, 1, 2, …  , N

[0790] where ƒ_(ij) ⁰=0 ∀ i, j say. The rate of convergence of this solution is determined by the value of the Lagrange multiplier that is used.

[0791] In general, the iterative nature of this nonlinear estimation method is undesirable, primarily because it is time consuming and may require many iterations before a solution is achieved with a desired tolerance.

[0792] We shall end this section by demonstrating a rather interesting result which is based on linearizing the MEM. This is achieved by retaining the first two terms (i.e. the linear terms) in the series representation of the exponential function leaving us with the following equation

ƒ_(ij)=2λ(s _(ij) ⊙⊙p _(ij) −p _(ij)

ƒ_(ij) ⊙⊙p _(ij))

[0793] Using the convolution and correlation theorems, in Fourier space, this equation becomes

F _(ij)=2λS _(ij) P* _(ij)−2λ|P _(ij)|² F _(ij)

[0794] Rearranging, we get $F_{ij} = \frac{S_{ij}P_{ij}^{*}}{{P_{ij}}^{2} + {{1/2}\quad \lambda}}$

[0795] Hence, we can define a linearized maximum entropy filter of the form $\frac{P_{ij}^{*}}{{P_{ij}} + {{1/2}\quad \lambda}}$

[0796] Notice, that this filter is very similar to the Wiener filter. The only difference is that the Wiener filter is regularized by a constant determined by the SNR of the data whereas this filter is regularized by a constant determined by the Lagrange multiplier.

[0797] 3. Bayesian Estimation

[0798] The processes discussed so far do not take into account the statistical nature of the noise inherent in a digital signal or image. To do this, another type of approach must be taken which is based on a result in probability theory called Bayes rule named after the English mathematician Thomas Bayes.

[0799] The Probability of an Event

[0800] Suppose we toss a coin, observe whether we get heads or tails and then repeat this process a number of times. As the number of trials increases, we expect that the number of times heads or tails occurs is half that of the number of trials. In other words, the probability of getting heads is ½ and the probability of getting tails is also ½. Similarly, if a dice with six faces is thrown repeatedly, then the probability of it landing on any one particular face is ⅙. In general, if an experiment is repeated N times and an event A occurs n times say, then the probability of this event P(A) is defined as ${P(A)} = {\lim\limits_{N->\infty}\left( \frac{n}{N} \right)}$

[0801] The probability is the relative frequency of an event as the number of trials tends to infinity. In practice, only a finite number of trials can be conducted and we therefore define the probability of an event A as ${P(A)} \simeq \frac{n}{N}$

[0802] where it is assumed that N is large.

[0803] The Joint Probability

[0804] Suppose we have two coins which we label C₁ and C₂. We toss both coins simultaneously N times and record the number of times C₁ is heads, the number of times C₂ is heads and the number of times C₁ and C₂ are heads together. What is the probability that C₁ and C₂ are heads together? Clearly, if m is the number of times out of N trials that heads occurs simultaneously, then the probability of such an event must be given by ${P\left( {C_{1}\quad {heads}\quad {and}\quad C_{2}\quad {heads}} \right)} = \frac{m}{N}$

[0805] This is known as the joint probability of C₁ being heads when C₂ is heads. In general, if two events A and B are possible and m is the number of times both events occur simultaneously, then the joint probability is given by ${P\left( {A\quad {and}\quad B} \right)} = \frac{m}{N}$

[0806] The Conditional Probability

[0807] Suppose we setup an experiment in which two events A and B can occur. We conduct N trials and record the number of times A occurs (which is n) and the number of times A and B occur simultaneously (which is m). In this case, the joint probability may written as ${P\left( {A\quad {and}\quad B} \right)} = {\frac{m}{N} = {\frac{m}{n} \times \frac{n}{N}}}$

[0808] Now, the quotient n/N is the probability P(A) that event A occurs. The quotient m/n is the probability that events A and B occur simultaneously given that event A has occurred. The latter probability is known as the conditional probability and is written as ${P\left( B \middle| A \right)} = \frac{m}{n}$

[0809] where the symbol B |A means ‘B given A’. Hence, the joint probability can be written as

P(A and B)=P(A)P(B |A)

[0810] Suppose that we do a similar type of experiment but this time we record the number of times p that event B occurs and the number of times q that event A occurs simultaneously with event B. In this case, the joint probability of events B and A occurring together is given by ${P\left( {B\quad {and}\quad A} \right)} = {\frac{q}{N} = {\frac{q}{p} \times \frac{p}{N}}}$

[0811] The quotient p/N is the probability P(B) that event B occurs and the quotient q/p is the probability of getting events B and A occurring simultaneously given that event B has occurred. The latter probability is just the probability of getting ‘A given B’, i.e. ${P\left( A \middle| B \right)} = \frac{q}{p}$

[0812] Hence, we have

P(B and A)=P(B)P(A |B)

[0813] Bayes Rule

[0814] The probability of getting A and B occurring simultaneously is exactly the same as getting B and A occurring simultaneously, i.e.

P(A and B)=P(B and A)

[0815] Hence, by using the definition of these joint probabilities in terms of the conditional probabilities we arrive at the following formula

P(A)P(B|A)=P(B)P(A |B)

[0816] or alternatively ${P\left( B \middle| A \right)} = \frac{{P(B)}{P\left( A \middle| B \right)}}{P(A)}$

[0817] This result is known as Bayes rule. It relates the conditional probability of ‘B given A’ to that of ‘A given B’.

[0818] Bayesian Estimation in Signal and Image Processing

[0819] In signal and image analysis Bayes rule is written in the form ${P\left( f \middle| s \right)} = \frac{{P(f)}{P\left( s \middle| f \right)}}{P(s)}$

[0820] where ƒ is the object that we want to recover from the signal

s(x)=p(x)

ƒ(x)+n(x)

[0821] or image

s(x, y)=p(x, y)

ƒ(z, y)+n(x, y)

[0822] This result is the basis for a class of restoration methods which are known collectively as Bayesian estimators.

[0823] Bayesian estimation attempts to recover ƒ in such a way that the probability of getting ƒ given s is a maximum. In practice, this is done by assuming that P(ƒ) and P(s |ƒ) obey certain statistical distributions which are consistent with the experiment in which s is measured. In other words, models are chosen for P(ƒ) and P(s |ƒ) and then ƒ is computed at the point where P(ƒ|s) reaches its maximum value. This occurs when ${\frac{\partial}{\partial f}{P\left( f \middle| s \right)}} = 0$

[0824] The function P is the Probability Density Function (PDF). The PDF P(ƒ|s) is called the a posteriori PDF. Since the logarithm of a function varies monotonically with that function, the a posteriori PDF is also a maximum when ${\frac{\partial}{\partial f}\ln \quad {P\left( f \middle| s \right)}} = 0$

[0825] Now, using Bayes rule, we can write this equation as ${{\frac{\partial}{\partial f}\ln \quad {P\left( s \middle| f \right)}} + {\frac{\partial}{\partial f}\ln \quad {P(f)}}} = 0$

[0826] Because the solution to this equation for ƒ maximizes the a posteriori PDF, this method is known as the Maximum a Posterior or MAP method. To illustrate the principles of Bayesian estimation, we shall now present some simple examples of how this technique can be applied to data analysis.

[0827] Bayesian Estimation—Example 1

[0828] Suppose that we measure a single sample s (one real number) in an experiment where it is known a priori that

s=ƒ+n

[0829] where n is noise (a single random number). Suppose that it is also known a priori that the noise is determined by a Gaussian distribution of the form (ignoring scaling)

P(n)=exp(−n ²/σ_(n) ²)

[0830] where σ_(n) ² is the standard deviation of the noise. Now, the probability of measuring s given ƒ—i.e. the conditional probability P(s |ƒ)—is determined by the noise since

n=s−ƒ

[0831] We can therefore write

P(s|ƒ)=exp[−(s−ƒ)²/σ_(n) ²]

[0832] To find the MAP estimate, the PDF for ƒ must also be known. Suppose that ƒ also has a zero-mean Gaussian distribution of the form

P(ƒ)=exp(−ƒ²/σ_(ƒ) ²)

[0833] Then, ${{\frac{\partial}{\partial f}\ln \quad {P\left( s \middle| f \right)}} + {\frac{\partial}{\partial f}\ln \quad {P(f)}}} = {{\frac{2\left( {s - f} \right)}{\sigma_{n}^{2}} - \frac{2\quad f}{\sigma_{f}^{2}}} = 0}$

[0834] Solving this equation for ƒ gives $f = \frac{s\quad \Gamma^{2}}{1 + \Gamma^{2}}$

[0835] where Γ is the SNR defined by $\Gamma = \frac{\sigma_{f}}{\sigma_{n}}$

[0836] Notice, that as σ_(n)→0, ƒ→s which must be true since s=ƒ+n and n has a zero-mean Gaussian distribution. Also, note that the solution we acquire for ƒ is entirely dependent on the a priori information we have on the PDF for ƒ. A different PDF produces an entirely different solution. For example, suppose it is known or we have good reason to assume that ƒ obeys a Rayleigh distribution of the form

P(ƒ)=ƒexp(−ƒ²/σ_(ƒ) ²), ƒ≧0

[0837] In this case, ${\frac{\partial}{\partial f}\ln \quad {P(f)}} = {\frac{1}{f} - \frac{2f}{\sigma_{f}^{2}}}$

[0838] and assuming that the noise obeys the same zero-mean Gaussian distribution ${{\frac{\partial}{\partial f}\ln \quad {P\left( s \middle| f \right)}} + {\frac{\partial}{\partial f}\ln \quad {P(f)}}} = {{\frac{2\left( {s - f} \right)}{\sigma_{n}^{2}} + \frac{1}{f} - \frac{2f}{\sigma_{f}^{2}}} = 0}$

[0839] This equation is quadratic in ƒ. Solving it, we get $f = {\frac{s\quad \Gamma^{2}}{2\left( {1 + \Gamma^{2}} \right)}\left( {1 \pm \sqrt{1 + {\frac{2\quad \sigma_{n}^{2}}{s^{2}}\left( {1 + \frac{1}{\Gamma^{2}}} \right)}}} \right)}$

[0840] The solution for ƒ which maximizes the value of P(ƒ|s), can then be written in the form $f = {\frac{s}{2\quad a}\left( {1 + \sqrt{1 + \frac{2a\quad \sigma_{n}^{2}}{s^{2}}}} \right)}$ where $a = {1 + \frac{1}{\Gamma^{2}}}$

[0841] This is a nonlinear estimate for ƒ. If $\frac{\sigma_{n}\sqrt{2a}}{s}{\operatorname{<<}1}$ then $f \simeq \frac{s}{a}$

[0842] In this case, ƒ is linearly related to s. In fact, this linearized estimate is identical to the MAP estimate obtained earlier where it was assumed that ƒ had a Gaussian distribution.

[0843] From the example given above, it should now be clear that Bayesian estimation (i.e. the MAP method) is only as good as the a priori information on the statistical behaviour of ƒ—the object for which we seek a solution. However, when P(ƒ) is broadly distributed compared with P(s |ƒ), the peak value of the a posteriori PDF will lie close to the peak value of P(ƒ). In particular, if P(ƒ) is roughly constant, then $\frac{\partial}{\partial f}\ln \quad {P(f)}$

[0844] is close to zero and therefore ${\frac{\partial}{\partial f}\ln \quad {P\left( f \middle| s \right)}} \simeq {\frac{\partial}{\partial f}\ln \quad {P\left( s \middle| f \right)}}$

[0845] In this case, the a posteriori PDF is a maximum when ${\frac{\partial}{\partial f}\ln \quad {P\left( s \middle| f \right)}} = 0$

[0846] The estimate for ƒ that is obtained by solving this equation for ƒ is called the Maximum Likelihood or ML estimate. To obtain this estimate, only a priori knowledge on the statistical fluctuations of the conditional probability is required. If, as in the previous example, we assume that the noise is a zero-mean Gaussian distribution, then the ML estimate is given by

ƒ=s

[0847] Note that this is the same as the MAP estimate when the standard deviation of the noise is zero.

[0848] The basic and rather important difference between the MAP and ML estimates is that the ML estimate ignores a priori information about the statistical fluctuations of the objectƒ. It only requires a model for the statistical fluctuations of the noise. For this reason, the ML estimate is usually easier to compute. It is also the estimate to use in cases where there is a complete lack of knowledge about the statistical behaviour of the object.

[0849] Bayesian Estimation—Example 2

[0850] To further illustrate the difference between the MAP and ML estimate and to show their use in signal analysis, consider the case where we measure N samples of a real signal s_(i) in the presence of additive noise n_(i) which is the result of transmitting a known signal ƒ_(i) modified by a random amplitude factor a. The samples of the signal are given by

s _(i) =aƒ _(i) +n _(i) , i=1, 2, . . . , N

[0851] The problem is to find an estimate for a. To solve problems of this type using Bayesian estimation, we must introduce multidimensional probability theory. In this case, the PDF is a function of not just one number s but a set of numbers s₁, s₂, . . . , s_(N). It is therefore a vector space. To emphasize this, we use the vector notation

P(s)≡P(s_(i))≡P(s₁, s₂, s₃, . . . , s_(N))

[0852] The ML estimate is given by solving the equation ${\frac{\partial}{\partial a}\ln \quad {P\left( s \middle| a \right)}} = 0$

[0853] for a. Let us again assume that the noise is described by a zero-mean Gaussian distribution of the form ${{P(n)} \equiv {P\left( {n_{1},n_{2},\ldots \quad,n_{N}} \right)}} = {\exp \left( {{- \frac{1}{\sigma_{n}^{2}}}{\sum\limits_{i = 1}^{N}n_{i}^{2}}} \right)}$

[0854] The conditional probability is then given by ${P\left( s \middle| a \right)} = {\exp \left( {{- \frac{1}{\sigma_{n}^{2}}}{\sum\limits_{i = 1}^{N}\left( {s_{i} - {af}_{i}} \right)^{2}}} \right)}$ and ${\frac{\partial}{\partial a}\ln \quad {P\left( s \middle| a \right)}} = {{\frac{2}{\sigma_{n}^{2}}{\sum\limits_{i = 1}^{N}{\left( {s_{i} - {af}_{i}} \right)f_{i}}}} = 0}$

[0855] Solving this last equation for a we obtain the ML estimate $a = \frac{\sum\limits_{i = 1}^{N}{s_{i}f_{i}}}{\sum\limits_{i = 1}^{N}f_{i}^{2}}$

[0856] The MAP estimate is obtained by solving the equation ${{\frac{\partial}{\partial a}\ln \quad {P\left( s \middle| a \right)}} + {\frac{\partial}{\partial a}\ln \quad {P(a)}}} = 0$

[0857] for a. Using the same distribution for the conditional PDF, let us assume that a has a zero-mean Gaussian distribution of the form

P(a)=exp(−a ²/σ_(a) ²)

[0858] where σ_(a) ² is the standard deviation. In this case, ${\frac{\partial}{\partial a}\ln \quad {P(a)}} = {- \frac{2a}{\sigma_{a}^{2}}}$

[0859] and hence, the MAP estimate is obtained by solving the equation ${{\frac{\partial}{\partial a}\ln \quad {P\left( s \middle| a \right)}} + {\frac{\partial}{\partial a}\ln \quad {P(a)}}} = {{{\frac{2}{\sigma_{n}^{2}}{\sum\limits_{i = 1}^{N}{\left( {s_{i} - {af}_{i}} \right)f_{i}}}} - \frac{2a}{\sigma_{a}^{2}}} = 0}$

[0860] for a. The solution to this equation is given by $a = \frac{\frac{\sigma_{a}^{2}}{\sigma_{n}^{2}}{\sum\limits_{i = 1}^{N}{s_{i}f_{i}}}}{1 + {\frac{\sigma_{a}^{2}}{\sigma_{n}^{2}}{\sum\limits_{i = 1}^{N}f_{i}^{2}}}}$

[0861] Note, that if σ_(a)>>σ_(n), then, $a \simeq \frac{\sum\limits_{i = 1}^{N}{s_{i}f_{i}}}{\sum\limits_{i = 1}^{N}f_{i}^{2}}$

[0862] which is the same as the ML estimate.

[0863] 3.1 The Maximum Likelihood Filter

[0864] In the last section, the principles of Bayesian estimation were presented. We shall now use these principles to design deconvolution algorithms for digital images under the assumption that the statistics are Gaussian. The problem is as follows: Given the real digital image $s_{ij} = {{\sum\limits_{n}{\sum\limits_{m}{p_{{i - n},{j - m}}f_{n\quad m}}}} + n_{ij}}$

[0865] find an estimate for ƒ_(ij) when p_(ij) is known together with the statistics for n_(ij). In this section, the ML estimate for ƒ_(ij) is determined by solving the equation ${\frac{\partial}{\partial f_{ij}}\ln \quad {P\left( s_{ij} \middle| f_{ij} \right)}} = 0$

[0866] As before, the algebraic form of the estimate depends upon the model that is chosen for the PDF. Let us assume that the noise has a zero-mean Gaussian distribution. In this case, the conditional PDF is given by ${P\left( s_{ij} \middle| f_{ij} \right)} = {\exp \left\lbrack {{- \frac{1}{\sigma_{n}^{2}}}{\sum\limits_{i}{\sum\limits_{j}\left( {s_{ij} - {\sum\limits_{n}{\sum\limits_{m}{p_{{i - n},{j - m}}f_{n\quad m}}}}} \right)^{2}}}} \right\rbrack}$

[0867] where σ_(n) ² is the standard deviation of the noise. Substituting this result into the previous equation and differentiating, we get ${\frac{2}{\sigma_{n}^{2}}{\sum\limits_{i}{\sum\limits_{j}{\left( {s_{ij} - {\sum\limits_{n}{\sum\limits_{m}{p_{{i - n},{j - m}}f_{n\quad m}}}}} \right)p_{{i - k},{j - l}}}}}} = 0$ or ${\sum\limits_{i}{\sum\limits_{j}{s_{ij}p_{{i - k},{j - l}}}}} = {\sum\limits_{i}{\sum\limits_{j}{\left( {\sum\limits_{n}{\sum\limits_{m}{p_{{i - n},{j - m}}f_{n\quad m}}}} \right)p_{{i - k},{j - l}}}}}$

[0868] Using the appropriate symbols, we may write this equation in the form

s _(ij) ⊙⊙p _(ij)=(p _(ij)

ƒ_(ij))⊙⊙p _(ij)

[0869] where ⊙⊙ and

denote the 2D correlation and convolution sums respectively. The ML estimate is obtained by solving the equation above for ƒ_(ij). This can be done by transforming it into Fourier space. Using the correlation and convolution theorems, in Fourier space this equation becomes S_(ij)P_(ij)^(*) = (P_(ij)F_(ij))P_(ij)^(*) and  thus $f_{ij} = {{{IDFT}\left( F_{ij} \right)} = {{IDFT}\left( \frac{S_{ij}P_{ij}^{*}}{{P_{ij}}^{2}} \right)}}$

[0870] where IDFT is taken to denote the (2D) Inverse Discrete Fourier Transform. Hence for Gaussian statistics, the ML filter is given by ${{ML}\quad {Filter}} = \frac{P_{m}^{*}}{{P_{m}}^{2}}$

[0871] which is identical to the inverse filter.

[0872] 3.2 The Maximum a Posteriori Filter

[0873] This filter is obtained by finding ƒ_(ij) such that ${{\frac{\partial}{\partial f_{kl}}\ln \quad {P\left( s_{ij} \middle| f_{ij} \right)}} + {\frac{\partial}{\partial f_{kl}}\ln \quad {P\left( f_{ij} \right)}}} = 0$

[0874] Consider the following models for the PDFs

[0875] (i) Gaussian statistics for the noise ${P\left( s_{ij} \middle| f_{ij} \right)} = {\exp \left\lbrack {{- \frac{1}{\sigma_{n}^{2}}}{\sum\limits_{i}{\sum\limits_{j}\left( {s_{ij} - {\sum\limits_{n}{\sum\limits_{m}{p_{{i - n},{j - m}}f_{n\quad m}}}}} \right)^{2}}}} \right\rbrack}$

[0876] (ii) Gaussian statistics for the object ${P\left( f_{ij} \right)} = {\exp \left( {{- \frac{1}{\sigma_{f}^{2}}}{\sum\limits_{i}{\sum\limits_{j}f_{ij}^{2}}}} \right)}$

[0877] By substituting these expressions for P(s_(ij)|ƒ_(ij)) and P(ƒ_(ij)) into the equation above, we obtain ${{\frac{2}{\sigma_{n}^{2}}{\sum\limits_{i}{\sum\limits_{j}{\left( {s_{ij} - {\sum\limits_{n}{\sum\limits_{m}{p_{{i - n},{j - m}}f_{n\quad m}}}}} \right)p_{{i - k},{j - l}}}}}} - {\frac{2}{\sigma_{f}^{2}}f_{kl}}} = 0$

[0878] Rearranging, we may write this result in the form ${s_{ij} \odot {\odot p_{ij}}} = {{\frac{\sigma_{n}^{2}}{\sigma_{f}^{2}}f_{ij}} + {\left( {p_{ij} \otimes {\otimes f_{ij}}} \right) \odot p_{ij}}}$

[0879] In Fourier space, this equation becomes ${S_{ij}P_{ij}^{*}} = {{\frac{1}{\Gamma^{2}}F_{ij}} + {{P_{ij}}^{2}F_{ij}}}$ where $\Gamma = \frac{\sigma_{f}}{\sigma_{n}}$

[0880] The MAP filter for Gaussian statistics is therefore given by ${{MAP}\quad {Filter}} = \frac{P_{ij}^{*}}{{P_{ij}}^{2} + {1/\Gamma^{2}}}$

[0881] Note, that this filter is the same as the Wiener filter under the assumption that the power spectra of the noise and object are constant. Also, note that ${\lim\limits_{\sigma_{n}->0}\left( {{MAP}\quad {Filter}} \right)} = {{ML}\quad {Filter}}$

[0882] 4. Reconstruction of Bandlimited Images

[0883] A bandlimited function is a function whose spectral bandwidth is finite. Most real signals and images are bandlimited functions. This leads one to consider the problem of how the bandwidth and hence the resolution of a bandlimited image, can be increased synthetically using digital processing techniques. In other words, how can we extrapolate the spectrum of a bandlimited function from an incomplete sample.

[0884] Solutions to this type of problem are important in image analysis where a resolution is needed that is not an intrinsic characteristic of the image provided and is difficult or even impossible to achieve experimentally. The type of resolution that is obtained by spectral extrapolation is referred to as super resolution.

[0885] Because sampled data are always insufficient to specify a unique solution and since no algorithm is able to reconstruct equally well all characteristics of an image, it is essential that the user is able to play a role in the design and execution of an algorithm and incorporate maximum knowledge of the expected features. This allows optimum use to be made of the available data and the users experience, judgement and intuition. Hence, an important aspect of practical solutions to the spectral extrapolation problem is the incorporation of a priori information on the structure of an object.

[0886] In this section, an algorithm is discussed which combines a priori information with the least squares principle to reconstruct a two dimensional function from limited (i.e. incomplete) Fourier data. This algorithm is essentially a modified version of the Gerchberg-Papoulis algorithm to accommodate a user defined weighting function.

[0887] 4.1 The Gerchberg-Papoulis Method

[0888] Let us consider the case where we have an image ƒ(x, y) characterized by a discrete spectrum F_(nm) which is composed of a finite number of samples: ${- \frac{N}{2}} \leq n \leq {\frac{N}{2} - \frac{M}{2}} \leq m \leq \frac{M}{2}$

[0889] These data are related to the image by the equation F_(n  m) = ∫_(−X)^(X)∫_(−Y)^(Y)f(x, y)e^(−i(k_(n)x + k_(m)y))xy

[0890] Here, ƒ is assumed to be of finite support X and Y, i.e.,

|x|≦X and |y|≦Y

[0891] and k_(n), k_(m) are discrete spatial frequencies. With this data, we can define the BandLimited function ${f_{BL}\left( {x,y} \right)} = {\sum\limits_{n}{\sum\limits_{m}{F_{n\quad m}^{{({{k_{n}x} + {k_{m}y}})}}}}}$

[0892] which is related to F_(nm) by a two-dimensional Fourier Series. Our problem is to reconstruct ƒ given F_(nm) or equivalently, ƒ_(BL). In this section, a solution to this problem is presented using the least squares principle. First, we consider a model for an estimate {circumflex over (ƒ)} of ƒ given by $\begin{matrix} {{\hat{f}\left( {x,y} \right)} = {\sum\limits_{n}{\sum\limits_{m}{A_{n\quad m}^{{({{k_{n}x} + {k_{m}y}})}}}}}} & \left( {4.1} \right) \end{matrix}$

[0893] This model is just a two-dimensional Fourier series representation of the object. Given this model, our problem is reduced to that of fining the coefficients A_(nm).

[0894] Using the least squares method, we compute A_(nm) by minimizing the mean square error e = ∫_(−X)^(X)∫_(−Y)^(Y)f(x, y) − f̂(x, y)²xy

[0895] This error is a minimum when $\frac{\partial e}{\partial A_{n\quad m}} = 0$

[0896] Differentiating, we obtain (see Appendix A) $\begin{matrix} {\frac{\partial e}{\partial A_{pq}} = {\frac{\partial}{\partial A_{pq}}{\int_{- X}^{X}{\int_{- Y}^{Y}{{{{f\left( {x,y} \right)} - {\sum\limits_{n}{\sum\limits_{m}{A_{n\quad m}^{{({{k_{n}x} + {k_{m}y}})}}}}}}}^{2}{x}{y}}}}}} \\ {= {\int_{- X}^{X}{\int_{- Y}^{Y}{\left( {{f\left( {x,y} \right)} - {\sum\limits_{n}{\sum\limits_{m}{A_{n\quad m}^{{({{k_{n}x} + {k_{m}y}})}}}}}} \right)^{- {{({{k_{p}x} + {k_{q}y}})}}}{x}{y}}}}} \end{matrix}$

[0897] Thus, e is a minimum when $\begin{matrix} {{\int_{- X}^{X}{\int_{- Y}^{Y}{{f\left( {x,y} \right)}^{- {{({{k_{p}x} + {k_{q}y}})}}}{x}{y}}}} =} \\ {\sum\limits_{n}{\sum\limits_{m}{A_{n\quad m}{\int_{- X}^{X}{\int_{- Y}^{Y}{^{{- {{({k_{p} - k_{n}})}}}x}^{{- {{({k_{q} - k_{m}})}}}y}{x}{y}}}}}}} \end{matrix}$

[0898] The left hand side the above equation is just the Fourier data F_(pq). Hence, after evaluating the integrals on he right hand side, we get $\begin{matrix} {F_{pq} = {4{XY}{\sum\limits_{n}{\sum\limits_{m}{A_{n\quad m}\sin \quad {c\left\lbrack {\left( {k_{p} - k_{n}} \right)X} \right\rbrack}\sin \quad {c\left\lbrack {\left( {k_{q} - k_{m}} \right)Y} \right\rbrack}}}}}} & (4.2) \end{matrix}$

[0899] The estimate {circumflex over (ƒ)}(x, y) can be computed by solving the equation above for the coefficients A_(nm). This is a two-dimensional version of the Gerchberg-Papoulis method and is a least squares approximation of ƒ(x, y).

[0900] 4.2 Incorporation of a Priori Information

[0901] Since we have considered an image ƒ of finite support, we can write equation (4.1) in the following ‘closed form’: $\begin{matrix} {{\hat{f}\left( {x,y} \right)} = {{w\left( {x,y} \right)}{\sum\limits_{n}{\sum\limits_{m}{A_{n\quad m}^{{({{k_{n}x} + {k_{m}y}})}}}}}}} & (4.3) \end{matrix}$

[0902] where ${w\left( {x,y} \right)} = \left\{ \begin{matrix} {1,} & {{{x} \leq X},} & {{y} \leq Y} \\ {0,} & {{{x} > X},} & {{y} > Y} \end{matrix} \right.$

[0903] Writing it in this form, we observe that ω (i.e. essentially the values of X and Y) represents a simple but crucial form of a priori information. This information is required to compute the sinc functions given in equation (4.2) and hence the coefficients A_(nm). Note, that the sinc functions (in particular the zero locations) are sensitive to the precise values of X and Y and hence small errors in X and Y can dramatically effect the computation of A_(nm). In other words, equation (4.2) is ill-conditioned.

[0904] The algebraic form of equation (4.3) suggests incorporating further a priori information into the ‘weighting function’ ω in addition to the support of the object ƒ. We therefore consider an estimate of the form ${\hat{f}\left( {x,y} \right)} = {{w\left( {x,y} \right)}{\sum\limits_{n}{\sum\limits_{m}{A_{n\quad m}^{{({{k_{n}x} + {k_{m}y}})}}}}}}$

[0905] where ω is now a generalized weighting function composed of limited a priori information on the structure of ƒ. If we now employ a least squares method to find A_(nm) based on the previous mean square error function, we obtain the following equation ${\int_{- X}^{X}{\int_{- Y}^{Y}{{f\left( {x,\quad y} \right)}{w\left( {x,\quad y} \right)}^{- {{({{k_{p}x} + {k_{q}y}})}}}\quad {x}\quad {y}}}} = {\sum\limits_{\quad n}^{\quad}\quad {\sum\limits_{\quad m}^{\quad}\quad {A_{nm}{\int_{- X}^{X}{\int_{- Y}^{Y}{\left\lbrack {w\left( {x,\quad y} \right)} \right\rbrack^{2}^{{- {{({k_{p} - k_{n}})}}}x}\quad ^{{- {{({k_{q} - k_{m}})}}}y}{x}\quad {y}}}}}}}$

[0906] The problem with this result is that the data on the left hand side is not the same as the Fourier data provided F_(pq). In other words, the result is not ‘data consistent’. To overcome this problem we introduce a modified version of the least square method which involves minimizing the error $\begin{matrix} {e = \left. {\int_{- X}^{X}\int_{- Y}^{Y}} \middle| {{f\left( {x,\quad y} \right)} - {\hat{f}\left( {x,\quad y} \right)}} \middle| {}_{2}{\frac{1}{w\left( {x,\quad y} \right)}\quad {x}\quad {y}} \right.} & \left. \text{(4.4} \right) \end{matrix}$

[0907] In this case, we find that e is a minimum when $\begin{matrix} {F_{pq} = {\sum\limits_{\quad n}^{\quad}\quad {\sum\limits_{\quad m}^{\quad}\quad {A_{nm}W_{p - {n,\quad q} - m}}}}} & \text{(4.5)} \end{matrix}$

[0908] where W_(p − n,  q − m) = ∫_(−X)^(X)∫_(−Y)^(Y)w(x,  y)^(−(k_(p) − k_(n))x)^(−(k_(q) − k_(m))y)  x  y

[0909] Equation (4.5) is data consistent, the right hand side of this equation being a discrete convolution of A_(nm) with W_(nm). Hence, using the notation for convolution, we may write this equation in the form

F _(nm) =A _(nm)

W _(nm)

[0910] Using the convolution theorem, in real space, this equation becomes

ƒ_(BL)(x, y)=a(x, y)ω_(BL)(x, y)

[0911] where ${f_{BL}\left( {x,\quad y} \right)} = {\sum\limits_{\quad n}^{\quad}\quad {\sum\limits_{\quad m}^{\quad}\quad {F_{nm}^{{({{k_{n}x} + {k_{m}y}})}}}}}$ ${w_{BL}\left( {x,\quad y} \right)} = {\sum\limits_{\quad n}^{\quad}\quad {\sum\limits_{\quad m}^{\quad}\quad {W_{nm}^{{({{k_{n}x} + {k_{m\quad}y}})}}}}}$ and ${a\left( {x,\quad y} \right)} = {\sum\limits_{\quad n}^{\quad}\quad {\sum\limits_{\quad m}^{\quad}\quad {A_{nm}^{{({{k_{n}x} + {k_{m}y}})}}}}}$

[0912] Now, since ${\hat{f}\left( {x,\quad y} \right)} = {{{w\left( {x,\quad y} \right)}{\sum\limits_{\quad n}^{\quad}\quad {\sum\limits_{\quad m}^{\quad}\quad {A_{nm}^{{({{k_{n}x} + {k_{m}y}})}}}}}} = {{w\left( {x,\quad y} \right)}{a\left( {x,\quad y} \right)}}}$

[0913] we obtain the simple algebraic result ${\hat{f}\left( {x,\quad y} \right)} = {\frac{w\left( {x,\quad y} \right)}{w_{BL}\left( {x,\quad y} \right)}{f_{BL}\left( {x,\quad y} \right)}}$

[0914] Here ω_(BL) is a bandlimited weighting function, bandlimited by the same extent as ƒ_(BL).

[0915] The algorithm presented above is based on an inverse weighted least squares error [i.e. equation (4.4)]. It is essentially an adaption of the Gerchberg-Papoulis method, modified to:

[0916] (i) accommodate a generalized weighting function ω(x, y);

[0917] (ii) provide data consistency [i.e. equation (4.5)].

[0918] The weighting function ω(x, y) can be used to encode as much information as is available on the structural characteristics of ƒ(x, y). Since equation (4.4) involves 1/ω(x, y), ω(x, y) must be confined to being a positive non-zero function. We can summarize this algorithm in the form ${reconstruction} = \frac{{{bandlimited}\quad {image} \times a\quad {priori}\quad {information}}\quad}{{bandlimited}\quad a\quad {priori}\quad {information}}$

[0919] Clearly, the success of this algorithm depends on the quality of the a priori information that is available, just as the performance of the Wiener filter or MEM depends upon a priori information on the functional form of the Point Spread Function.

[0920] 5. Reconstruction From Projections: Computed Tomography (CT)

[0921] Computed Tomography (CT) is used in a wide range of applications, most notably for medical imaging (the CT-scan). The mathematical basis of this mode of imaging is compounded in an integral transform called the Radon transform, named after the Austrian mathematician, Johannes Radon. Since the development of the CT-scan, the Radon transform has found applications in many diverse subject areas; from astrophysics to seismic exploration and more recently, computer vision.

[0922] This section is concerned with the Radon transform and some of the numerical techniques that can be used to compute it. Particular attention is focused on three methods of computing the inverse Radon transform using (i) back-projection and deconvolution, (ii) filtered back-projection and (iii) the central slice theorem.

[0923] 5.1 Computed Tomography

[0924] In 1917, J Radon published a paper in which he showed that the complete set of one-dimensional projections obtained from a continuous two-dimensional function, contains all the information required to reconstruct this same function. A projection is obtained by integrating a 2D function over a set of parallel lines and is characteristic of its angle of rotation in the 2D plane. The Radon transform provides one of the most successful theoretical basis for imaging both the two-dimensional and three-dimensional internal structure of inhomogeneous objects. Consequently, it has a wide range of applications.

[0925] The mathematics of projection tomography considers continuous functions. The inverse problem therefore involves the reconstruction of an object function from an infinite set of projections. In practice, only a finite number of projections can be taken. Hence, only an approximation to the original function can be obtained by computing the inverse Radon transform digitally. The accuracy of this approximation can be improved by increasing the number of projections used and employing image enhancement techniques.

[0926] 5.2 Some Applications of Computed Tomography

[0927] Due to the rapid advances in the field of computed tomography, the horizons of radiology have expanded beyond traditional X-ray radiography to embrace X-ray CT, microwave CT, ultrasonic CT and emission CT to name but a few. All these subject applications are based on the Radon transform.

[0928] Other areas were the Radon transform has been applied are in computer vision (linear feature recognition) and astronomy (e.g. mapping solar microwave emissions).

[0929] X-ray Tomography

[0930] The X-ray problem was the prototype application for the active reconstruction of images. The term ‘active’, arises from the use of external probes to collect information about projections. An alternative approach (i.e. passive reconstruction), does not require external probes.

[0931] The X-ray process involves recording X-rays on a photographic plate as they emerge from a three dimensional object after having been attenuated by an amount that is determined by the path followed by a particular ray through the object. This gives an image known as a radiograph. Each grey level of this type of image is determined by the combined effect of all the absorbing elements that lie along the path of an individual ray.

[0932] We can consider a three dimensional object to be composed of two dimensional slices which are stacked, one on top of the other. Instead of looking at the absorption of X-rays over a composite stack of these slices, we can choose to study the absorption of X-rays as they pass through an individual slice. To do this, the absorption properties over the finite thickness of the slice must be assumed to be constant. The type of imaging produced by looking at the material composition and properties of a slice is known as a tomography. The absorption of X-rays as they pass through a slice provides a single profile of the X-ray intensity. This profile is characteristic of the material in the slice.

[0933] A single profile of the X-ray intensity associated with a particular slice only provides a qualitative account of the distribution of material in a slice. In other words, we only have one-dimensional information about a two dimensional object just as in conventional X-ray radiography, we only have two-dimensional information (an image) about a three dimensional object. A further degree of information can be obtained by changing the direction of the X-ray beam. This is determined by the angle of rotation θ of a slice relative to the source or equivalently, the location of the source relative to the slice. Either way, further information on the composition of the material may be obtained by observing how the X-ray intensity profile varies with the angle of rotation.

[0934] In X-ray imaging, computer tomography provides a quantitative image of the absorption coefficient of X-rays with initial intensity I₀. If an X-ray passes through a homogeneous material with attenuation coefficient α over a length L, then the resulting intensity is just

I=I ₀exp(−αL)

[0935] If the material is inhomogeneous, then we can consider the path along which the ray travels to consist of different attenuation coefficients α_(i) over elemental lengths Δl_(i). The resulting intensity is given by

I=I ₀exp[−(α₁ Δl ₁+α₂ Δl ₂+ . . . +α_(N) Δl _(N))]

[0936] where ${\sum\limits_{i = 1}^{N`}\quad {\Delta \quad l_{i}}} = L$

[0937] As Δl_(i)→0, this result becomes I = I₀exp (−∫_(L)  α  l)

[0938] By computing the natural logarithm of I/I₀, we obtain the data P = ∫_(L)  α  l where $P = {- {\ln \left( \frac{I}{I_{0}} \right)}}$

[0939] The value of the intensity and therefore P depends upon the point where the ray passes through the object which shall be denoted by z. It also depends on the orientation of the object about its centre θ. Hence, by adjusting the source of X-rays and the orientation of the attenuating object, a full sequence of projections can be obtained which are related to the two dimensional attenuation coefficient α(x, y) by the equation P(z,  θ) = ∫_(L)  α(x,  y)  l

[0940] where dl is an element of a line passing through the function α(x, y) and L depends on z and θ. This function is a line integral through the two- dimensional X-ray absorption coefficient α(x, y). It is a projection of this function and characteristic of θ. If P is known for all values of z and θ, then P is the Radon transform of α and that α can be reconstructed from P by employing the inverse Radon transform.

[0941] Advances in CT scanning have been closely related to the development of faster and more effective algorithms in conjunction with technological improvements in hardware. The modern scanned images have come a long way since the original body scanning pictures produced by Hounsfield in 1970. Major advances have occurred with the development of a new generation of scanners called Dynamic Spatial Reconstructors. These machines provide two very powerful new dimensions to computed tomography; high resolution and synchronous (fully three dimensional) scanning. Their capabilities have revolutionized present day medical imaging capabilities. For example, they allow the dynamic study of anatomical structural and functional relationships of moving organ systems such as heart, lungs and circulatory systems. These new generation CT systems are now capable of simultaneous three dimensional reconstructions of vascular anatomy and circulatory dynamics in any region of the body.

[0942] Ultrasonic Computed Tomography

[0943] As in X-ray computed tomography, the aim of Ultrasonic Computed Tomography (UCT) is to reconstruct transverse cross sectional images from projection data obtained when a probe (ultrasound in this case) passes through the object. Under appropriate conditions, the probe may be used to determine ultrasonic attenuation and ultrasonic velocity distributions of an inhomogeneous object. The latter case is based on emitting short pulses of ultrasound and recording the time taken for each pulse to reach a detector. If the material in which the pulse propagates is homogeneous, the ‘time-of-flight’ for the pulse to traverse the distance between source and detector along a line L, is given by the expression $t = \frac{L}{\upsilon}$

[0944] where v is the velocity at which the pulse propagates through the material. If the material is homogeneous along L, then the time of flight becomes ${t\left( {z,\quad \theta} \right)} = {\int_{L}^{\quad}\frac{l}{\upsilon \left( {x,\quad y} \right)}}$

[0945] A tomogram of the inhomogeneous velocity of the material can then be obtained by inverting the above equation. This result is the basis of UCT imaging.

[0946] In addition to performing ‘time-of-flight’ experiments, the decay in amplitude of the ultrasonic probe can be measured. This allows a tomogram of the ultrasonic absorption of a material to be obtained. Images of this kind may be interpreted as maps of the viscosity of the material since it is the viscous nature of a material that is responsible for absorbing ultrasonic radiation. By using electromagnetic probes, we can obtain information about the spatial distribution of the dielectric characteristics of a material using an appropriate time of flight experiment or the conductivity of a material by measuring the decay in amplitude of the electromagnetic field.

[0947] Emission Computed Tomography

[0948] Emission Computed Tomography (ECT) refers to the use of radioactive isotopes as passive probes. The passive approach does not require external probes. There is a probe involved but it comes from the object itself. In the case of ECT, we determine the distribution (location and concentration) of some radioactive isotope inside an object by studying the emitted photons.

[0949] There are two basic types of ECT depending on whether the isotope utilized is a single photon emitter, such as iodine-131, or a positron (e⁺ or β⁺) emitter, such as carbon-11. When a β⁺ emitter is used, the ejected positron loses most of its energy over a few millimetres. As it comes to rest, it annihilates with a nearby electron resulting in the formation of two γ-ray photons which travel in opposite directions along the same path. If a ring of detectors is placed around the object and two of the detectors simultaneously record γ-ray photons, then the radio-nucleide is known to lie somewhere along the line between the detectors. The reconstruction problem can therefore be cast in terms of the Radon transform where a complete set of projections is a measure of the total radio-nucleide emission.

[0950] The use of ECT has provided a dramatic advancement in nuclear medicine including investigations into brain and heart metabolisms. Other possibilities include new methods for cancer detection. In engineering applications ECT has be used to investigate the distribution of oil in different engines for example by doping the oil with a suitable radio-nucleide

[0951] Diffraction Tomography

[0952] Diffraction tomography is a method of imaging which is based on reconstructing an object from measurement on the way in which it diffracts a wavefield probe. Unlike X-ray CT, this involves the use of a radiation field whose wavelength is the same order of magnitude as the object (e.g. ultrasound, with a wavelength ˜10⁻³ m for example and millimetric microwaves). Two methods have been researched to date using (i) CW (Continuous Wave) fields and (ii) pulsed fields. In the latter case, it can been shown that the time history of the diffraction pattern set-up by a short pulse of radiation is related to the internal structure of the diffracting object by the Radon transform. Hence, in principle, the object can be reconstructed by employing algorithms for computing the inverse Radon transform.

[0953] Computer Vision

[0954] An interesting applications of the Radon transform has been in the area of computer vision. Computer vision is concerned with the analysis and recognition of features in an images. It is particularly important to manufacturing industry for automatic inspection and for military applications (e.g. guided weapons systems and automatic targeting).

[0955] The projection transform utilized in computer vision is the Hough transform. The Hough transform was derived independently from the Radon transform in the early 1960s. However, the Hough transform is just a special case of the Radon transform and is used in the identification of lines in digital images.

[0956] The Radon transform of a function concentrated at a point, described by the 2D delta function

δ²(x−x ₀ , y−y ₀)=δ(x−x ₀)δ(y−y ₀)

[0957] yields a sinusoidal curve

p=x ₀ cos θ+y ₀ sin θ

[0958] in pθ-plane. All co-linear points in the xy-plane along a line determined by fixed values θ and p, map to sinusoidal curves in the pθ-plane and intersect in the same point. Thus, if we choose a suitable method for plotting the projections of a digital image as a function of θ and p, it follows that the Radon transform may be regarded as a line to point transformation. By utilizing the line detection properties of the Radon transform, the edges of manufactured objects can be analysed against there known characteristics. From these characteristics, identification of faults can be spotted.

[0959] Other areas of science which are realising the important properties of the Radon transformation include the fields of astronomy, optics, and nuclear magnetic resonance.

[0960] 5.3 The Radon Transform

[0961] In this section, the Radon transform of a two-dimensional ‘Object Function’ ƒ(x, y) on a Euclidean space will be discussed. To start with, the geometry of the Radon transform shall be presented to provide a conceptual guide to its operation and transformation properties. This will be followed by a rigourous mathematical derivation of the Radon transform which is based entirely on the analytical properties of the two-dimensional Dirac delta function.

[0962] A Conceptual Guide to the Radon Transform

[0963] Consider an inhomogeneous object of compact support, defined in a two-dimensional Cartesian space by the object function ƒ(x, y). The mapping defined by the projection or line integral of ƒ along all possible lines L can be written in the form P = ∫_(L)  f(x,  y)  l

[0964] where dl is an increment of length along L.

[0965] The projection P depends on two variables; the angle of rotation of the object in the xy-plane and the distance z of the line of integration L from the centre of the object. Hence, the equation above represents a mapping from (x, y) cartesian co-ordinates to (z, θ) polar coordinates. This can be indicated explicitly by writing $\begin{matrix} {{P\left( {z,\quad \theta} \right)} = {\int_{L{({x,\quad \theta})}}^{\quad}\quad {{f\left( {x,\quad y} \right)}{l}}}} & (5.1) \end{matrix}$

[0966] If P(z, θ) is known for all z and θ, then P(z, θ) is the Radon transform of ƒ(x, y), i.e.

P={circumflex over (R)}ƒ

[0967] where {circumflex over (R)} is the Radon transform operator.

[0968] There are a number of equivalent ways of attempting to define the Radon transform operator {circumflex over (R)}. The approach used here is one of defining {circumflex over (R)} in terms of a two-dimensional integral transform, the Kernel of this transform being the Dirac delta function which allows a range of analytical properties to be exploited. The main mathematical results are defined in the following section, where the function P (the Radon transform of the object function ƒ) is shown to be related to ƒ by

P(z, θ)={circumflex over (R)}ƒ(x, y)=∫ƒ(r)δ(z−{circumflex over (n)}·r)d ² r

[0969] where {circumflex over (n)} is a unit vector which points in a direction perpendicular to the family of parallel lines of integration L. The integral in the equation above is taken over the spatial extent of the object function ƒ which is taken to have finite support.

[0970] Noting that

{circumflex over (n)}·r=x cos θ+y sin θ

[0971] the equation for the projections P(z, θ) becomes

P(z, θ)=∫∫ƒ(x, y)δ(z−x cos θ−y sin θ)dxdy

[0972] This function only exits when

z=x cos θ+y sin θ

[0973] the delta function being zero otherwise. The equivalence between this definition for P and the one given by equation (5.1) becomes clear if we consider a projection to be just the family of line integrals through the object function when it has been rotated about its axis by an angle θ. To illustrate this, consider the case when θ=0. In this case, using the equation for P above we get

P(z, 0)=∫∫ƒ(x, y)δ(z−x)dxdy=∫ƒ(z, y)dy

[0974] Here, the projection P(z, 0) is obtained by integrating the object over y for all values of the projection co-ordinate z. As a second example, consider the case when θ=π/2, giving

P(z, π/2)=∫∫ƒ(x, y)δ(z−y)dxdy=∫ƒ(x, y)dx

[0975] In this case, the projection is obtained by integrating along x for all values of z.

[0976] The material discussed here is concerned with methods of computing both the forward and inverse Radon transform. The former case involves computing the integral given in equation (5.1). The inverse Radon transform is concerned with solving the problem of reconstructing the object function ƒ(x, y) given the projections P(z, θ) for all values of z and θ, i.e. inverting the integral transform

P={circumflex over (R)}ƒ

[0977] giving

ƒ={circumflex over (R)}⁻¹P

[0978] where R⁻¹ is the inverse Radon transform operator. The problem is therefore compounded in developing accurate and efficient methods for computing {circumflex over (R)}⁻¹ using a digital computer.

[0979] 5.4 Derivation of the Radon Transform

[0980] In this section, the Radon transform shall be derived using the analytical properties of the two-dimensional Dirac delta function alone. Various results shall be employed with the aim of expressing the two-dimensional delta function in a prescribed integral form. The result will then be combined with the sampling property of the two-dimensional delta function to obtain a formal definition of the Radon transform and its inverse. Unless stated otherwise, all integral lie between −∞ and ∞.

[0981] We begin by defining the two-dimensional delta function,

δ²(r−r ₀)=δ(x−x ₀)δ(y−y ₀)

[0982] where

r={circumflex over (x)}x+ŷy

[0983] and

r ₀ ={circumflex over (x)}x ₀+ŷy₀

[0984] {circumflex over (x)} and ŷ being unit vectors in the x and y direction respectively. We now employ the following integral representation for the two-dimensional delta function, $\begin{matrix} {{{\delta^{2}\left( {r - r_{0}} \right)} = {{\frac{1}{\left( {2\pi} \right)^{2}}{\int{\int_{\quad}^{\quad}{{\exp \left\lbrack {\quad {k \cdot \left( {r_{0} - r} \right)}} \right\rbrack}\quad {^{2}k}}}}} = {\frac{1}{\left( {2\pi} \right)^{2}}{\int{\int{{\exp \left( {\quad k\quad {\hat{n} \cdot r_{0}}} \right)}{\exp \left( {{- }\quad k\quad {\hat{n} \cdot r}} \right)}{^{2}k}}}}}}}{where}{{\hat{n} = \frac{k}{k}};{k = |k|}}} & \text{(5.2)} \end{matrix}$

[0985] Also, we introduce the relationship (a consequence of the sampling property of the delta function)

∫δ(z−{circumflex over (n)}·r) exp(−ikz)dz=exp(−ik{circumflex over (n)}·r)

[0986] Substituting this result in to equation (5.2), we obtain ${\delta^{2}\left( {r - r_{0}} \right)} = {\frac{1}{\left( {2\pi} \right)^{2}}{\int{\int\quad {{^{2}k}\quad {\exp \left( {\quad k\quad {\hat{n} \cdot r_{0}}} \right)}{\int{{\delta \left( {z - {\hat{n} \cdot r}} \right)}{\exp \left( {{- }\quad {kz}} \right)}{z}}}}}}}$

[0987] At this stage, it is useful to convert to polar co-ordinates d²k=kdkdθ, giving (after combining the exponential terms) ${\delta^{2}\left( {r - r_{0}} \right)} = {\frac{1}{\left( {2\pi} \right)^{2}}{\int_{0}^{2\pi}\quad {{\theta}{\int_{0}^{\infty}\quad {{k}\quad k{\int\quad {{z}\quad {\exp\left\lbrack {\quad {k\left( {{\hat{n} \cdot r_{0}} - z} \right)}} \middle| {\delta \left( {z - {\hat{n} \cdot r}} \right)} \right.}}}}}}}}$

[0988] We can then write the two-dimensional delta function in the following alternative form, ${\delta^{2}\left( {r - r_{0}} \right)} = \left. {\frac{1}{\left( {2\pi} \right)^{2}}{\int_{0}^{\pi}\quad {\int{k}}}} \middle| k \middle| {\int\quad {{z}\quad {\exp \left\lbrack {\quad {k\left( {{\hat{n} \cdot r_{0}} - z} \right)}} \right\rbrack}{\delta \left( {z - {\hat{n} \cdot r}} \right)}}} \right.$

[0989] If we now employ the sgn function defined by ${{sgn}(k)} = \left\{ \begin{matrix} {{+ 1},} & {{k \geq 0};} \\ {{- 1},} & {k > 0.} \end{matrix} \right.$

[0990] then |k | can be re-written as k sgn(k) so that $\begin{matrix} {{\delta^{2}\left( {r - r_{0}} \right)} = {\frac{1}{\left( {2\pi} \right)^{2}}{\int_{0}^{\pi}{{\theta}{\int{{k}\quad {{sgn}(k)}k{\int{{z}\quad {\exp\left\lbrack {{k}\left( {{\hat{n} \cdot r_{0}} - z} \right)} \right\rbrack}{\delta \left( {z - {\hat{n} \cdot r}} \right)}}}}}}}}} & \text{(5.3)} \end{matrix}$

[0991] We can progress further by utilizing the result $\begin{matrix} {{\frac{\partial\quad}{\partial z}{\delta \left( {z - {\hat{n} \cdot r}} \right)}} = {\frac{\partial\quad}{\partial z}\left( {\frac{1}{2\pi}{\int{{\exp \left\lbrack {\quad {k\left( {z - {\hat{n} \cdot r}} \right)}} \right\rbrack}{k}}}} \right)}} \\ {= {{\quad {k\left( {\frac{1}{2\pi}{\int{{\exp \left\lbrack {{- }\quad {k\left( {z - {\hat{n} \cdot r}} \right)}} \right\rbrack}{k}}}} \right)}} - {\quad k\quad {\delta \left( {z - {\hat{n} \cdot r}} \right)}}}} \end{matrix}$

[0992] After multiplying both sides of this equation by exp(−ikz) and integrating over z, we obtain the relation ${{k{\int{{\delta \left( {z - {\hat{n} \cdot r}} \right)}\exp}}} - {\left( {\quad {kz}} \right){z}}} = {{- i}{\int{\left( {\frac{\partial\quad}{\partial z}{\delta \left( {z - {\hat{n} \cdot r}} \right)}} \right){\exp \left( {\quad {kz}} \right)}{z}}}}$

[0993] Substituting this result back into equation (5.3) and changing the order of integration, we get ${\delta^{2}\left( {r - r_{0}} \right)} = {\frac{- i}{\left( {2\pi} \right)^{2}}{\int_{0}^{\pi}{{\theta}{\int{{{z\left( {\frac{\partial\quad}{\partial z}{\delta \left( {z - {\hat{n} \cdot r}} \right)}} \right)}}{\int{{k}\quad {{sgn}(k)}{\exp \left\lbrack {\quad {k\left( {{\hat{n} \cdot r_{0}} - z} \right)}} \right\rbrack}}}}}}}}$

[0994] Finally, we use the result ${\int{\frac{1}{u}{\exp \left( {{- }\quad {ku}} \right)}{u}}} = {- {{\pi sgn}(k)}}$

[0995] where u is a dummy variable. The left hand side of this equation is just the Fourier transform of 1/u. Hence, on taking the inverse Fourier transform, we obtain $\frac{1}{u} = {{\frac{1}{2\pi}{\int{\left( {{- }\quad \pi} \right){{sgn}(k)}{\exp \left( {\quad {ku}} \right)}{k}}}} = {{- \frac{i}{2}}{\int{{{sgn}(k)}{\exp \left( {\quad {ku}} \right)}{k}}}}}$

[0996] or, after rearranging, ${\int{{k}\quad {{sgn}(k)}{\exp \left( {\quad {ku}} \right)}}} = \frac{2i}{u}$

[0997] Substituting this result back into the last expression for δ², we obtain our desired integral form for the two-dimensional delta function, i.e. ${\delta^{2}\left( {r - r_{0}} \right)} = {{- \frac{1}{2\pi^{2}}}{\int_{0}^{\pi}{{\theta}{\int{{z}\frac{1}{z - {\hat{n} \cdot r_{0}}}\frac{\partial\quad}{\partial z}{\delta \left( {z - {\hat{n} \cdot r}} \right)}}}}}}$

[0998] This expression for the two-dimensional delta function allows us to derive both the forward and inverse Radon transforms relatively easily. This can be done by using the sampling property of the two-dimensional delta function, namely, f(r₀) = ∫f(r)δ²(r − r₀)²r

[0999] Substituting the expression for δ² given above into this equation and interchanging the order of integration, we get $\begin{matrix} \begin{matrix} {{f\left( r_{0} \right)} = {- {\int{{f(r)}\frac{1}{2\pi^{2}}{\int_{0}^{\pi}\quad {{\theta}{\int\quad {{z}\frac{1}{z - {\hat{n} \cdot r_{0}}}\frac{\partial\quad}{\partial z}{\delta \left( {z - {\hat{n} \cdot r}} \right)}\quad {^{2}r}}}}}}}}} \\ {= {{- \frac{1}{2\pi^{2}}}{\int_{o}^{\pi}\quad {{\theta}{\int\quad {{z}\frac{1}{z - {\hat{n} \cdot r_{0}}}\frac{\partial\quad}{\partial z}{P\left( {\hat{n},\quad z} \right)}}}}}}} \end{matrix} & (5.4) \end{matrix}$

[1000] where

P({circumflex over (n)}, z)={circumflex over (R)}ƒ(r)=∫ƒ(r)δ(z−{circumflex over (n)}·r)d ² r

[1001] The function P is defined as the Radon transform of ƒ. The beauty of deriving the Radon transform in this way is that the inverse Radon transform is immediately apparent from equation (5.4), i.e. ${f(r)} = {{{\hat{R}}^{- 1}{P\left( {\hat{n},\quad z} \right)}} = {{- \frac{1}{2\pi^{2}}}{\int_{0}^{\pi}\quad {{\theta}{\int\quad {{z}\frac{1}{z - {\hat{n} \cdot r}}\frac{\partial\quad}{\partial z}{P\left( {\hat{n},\quad z} \right)}}}}}}}$

[1002] 5.5 Reconstruction Methods

[1003] The formula for reconstructing a function from its Radon transform is given by $\begin{matrix} {{f(r)} = {{{\hat{R}}^{- 1}{P\left( {\hat{n},\quad z} \right)}} = {{- \frac{1}{2\pi^{2}}}{\int_{0}^{\pi}\quad {{\theta}{\int\quad {{z}\frac{1}{z - {\hat{n} \cdot r}}\frac{\partial\quad}{\partial z}{P\left( {\hat{n},\quad z} \right)}}}}}}}} & \text{(5.5)} \end{matrix}$

[1004] This formula is always valid is cases where P is continuous over an infinite set of projections for all lines, rather than a discrete set. This result is compounded in the Indeterminacy Theorem which states that ‘A function of compact support in two dimensional Radon space is uniquely determined by an infinite set; but by no finite set of its projections’. Thus a digital reconstruction process based on equation (5.5) will only be an approximation of the actual object by non-unique approximations. In other words, although the unknown function cannot be reconstructed exactly, good approximations can be found by utilising an increasingly large number of projections.

[1005] This section is concerned with methods of computing the inverse Radon transform given by equation (5.5). The reconstruction methods presented are:

[1006] (i) Reconstruction by Filtered Back-Projection.

[1007] (ii) Reconstruction by Back-Projection and Deconvolution.

[1008] (iii) Reconstruction using the Projection Slice Theorem.

[1009] Theoretically, all these methods are completely equivalent and are essentially variations on equation (5.5). However, computationally, each method poses a different set of problems and requires an algorithm whose computational performance can vary significantly depending on the data type and its structure.

[1010] The first two reconstruction methods listed above use the back-projection process as an intermediate step and are classified according to whether filtering is applied before (i) or after (ii) back-projection. In the following section, the back-projection process is discussed.

[1011] Back-Projection

[1012] The result B(x, y) of back-projecting a sequence of projections,

P(z,θ); z=x cos θ+y sin θ

[1013] can be written as ${B\left( {x,\quad y} \right)} = {\frac{1}{2\pi}{\int_{0}^{\pi}{{P\left( {{x\quad \cos \quad \theta} + {y\quad \sin \quad \theta,\quad \theta}} \right)}\quad {\theta}}}}$

[1014] In polar co-ordinates (r, θ′) where x=r cos θ′ and y=r sin θ′, we have $\begin{matrix} {{B\left( {r,\quad \theta} \right)} = {\frac{1}{2\pi}{\int_{0}^{\pi}{{P\left\lbrack {r\quad {\cos \left( {\theta^{\prime} - \theta} \right)},\quad \theta^{\prime}} \right\rbrack}\quad {\theta^{\prime}}}}}} & \text{(5.6)} \end{matrix}$

[1015] This result will be used later on. The function P(x cos θ+y sin θ, θ) is the distribution of P along the family of lines L. For a fixed value of θ, P(x cos θ+y sin θ, θ) is constructed by assigning the value of P at a point on z to all points along the original line of projection L. By repeating the process for all values of z and for each value of θ, the function P(x cos θ+y sin θ, θ) is obtained. Then, by summing all the functions P obtained for different values of θ between 0 and π, the back-projection function B is computed.

[1016] The back-projected function is a ‘blurred’ representation of the true object function. This necessitates a filtering operation to amplify the high frequency content of B. The required filter is obtained by performing a Fourier analysis of the operation $\int\quad {{z}\frac{1}{z - {\hat{n} \cdot r}}\frac{\partial\quad}{\partial z}{P\left( {\hat{n},\quad z} \right)}}$

[1017] in equation (5.5).

[1018] Reconstruction by Filtered Back-Projection

[1019] In this section, we analyse the reconstruction of ƒ from P in terms of an appropriate set of operators. This makes the task of formulating an appropriate filtering operation easier. To start with, let us re-write equation (5.5) in the form ${f(r)} = {{- \frac{1}{2\pi}}{\int_{0}^{\pi}\quad {{\theta}\frac{1}{\pi}{\int\quad {{z}\frac{1}{z - {\hat{n} \cdot r}}\frac{\partial\quad}{\partial z}{P\left( {\hat{n},\quad z} \right)}}}}}}$

[1020] Observe, that the integral over z is just the Hilbert transform of $\frac{\partial\quad}{\partial z}{P\left( {\hat{n},\quad z} \right)}$

[1021] If we denote the Hilbert transform operator by Ĥ, then we can write ${\hat{H}{\partial_{x}{P\left( {\hat{n},\quad z} \right)}}} = {\frac{1}{\pi}{\int{\frac{\partial_{x}{P\left( {\hat{n},\quad z} \right)}}{z - {\hat{n} \cdot r}}{z}}}}$

[1022] where, for convenience, $\partial_{x}{\equiv \frac{\partial\quad}{\partial z}}$

[1023] Note, that the Hilbert transform is just a convolution in z. Let us also denote the back-projection process by the operator {circumflex over (B)}, i.e. ${\hat{B}{f\left( {\hat{n},{\hat{n} \cdot r}} \right)}} = {\frac{1}{2\pi}{\int_{0}^{\pi}{{f\left( {\hat{n},{\hat{n} \cdot r}} \right)}{\theta}}}}$

[1024] Using these operators, equation (5.5) can be written in the form

ƒ(r)={circumflex over (R)} ⁻¹ P({circumflex over (n)}, z)={circumflex over (B)}Ĥ∂ _(x) P({circumflex over (n)}, z)

[1025] It is now clear that the inverse Radon transform as actually composed of three separate operations;

[1026] differentiation ∂_(x)

[1027] Hilbert transform Ĥ

[1028] Back-projection {circumflex over (B)}

[1029] We can illustrate this by introducing the operator equivalence relationship,

{circumflex over (R)}⁻¹={circumflex over (B)}Ĥ∂_(x)

[1030] Since the Hilbert transform is a linear functional, we have

Ĥ∂_(x)P=∂_(x)ĤP

[1031] so the order in which the first operations are carried out (prior to back-projecting) does not matter.

[1032] The computational method which involves the operation {circumflex over (B)}Ĥ∂_(x) is known as Filtered Back-projection, the filtering being a consequence of the operation Ĥ∂_(x). The exact form of the filter that is associated by this operation can be found by Fourier analysis. For a fixed value of {circumflex over (n)} we can write ${\hat{H}\partial_{z}}\quad = {\frac{1}{\pi z} \otimes \frac{\partial P}{\partial z}}$

[1033] where P is the projection obtained for a given {circumflex over (n)} and

is the convolution operation. To find the filter we need to Fourier analyse this expression. This can be done by using the results ${{\hat{F}}_{1}\frac{\partial P}{\partial z}} = {{ik}{\hat{F}}_{1}P}$ and ${{\hat{F}}_{1}\left( \frac{1}{\pi z} \right)} = {{- i}\quad {{sgn}(k)}}$

[1034] where {circumflex over (F)}₁ is the one-dimensional Fourier transform operator and k is the spatial frequency and gives

{circumflex over (F)}(Ĥ∂ _(x) P)=−i sgn(k)(ik{circumflex over (F)} ₁ P)

[1035] Now,

−i sgn(k)(ik)=sgn(k)k=|k|

[1036] Hence, the operation Ĥ∂_(x)P in real space is equivalent to applying the filter |k| in Fourier space. We can therefore write the reconstruction formula given by equation (5.5) in the form

ƒ(r)={circumflex over (B)}{circumflex over (F)} ₁ ⁻¹ [|k|{circumflex over (F)} ₁ P({circumflex over (n)}, z)]

[1037] Reconstruction by Back-Projection and Deconvolution

[1038] Another method of reconstructing ƒ from P can be acquired by considering the effect of back-projecting without filtering. The result will be some blurred version of the object function. The blurring inherent in such a reconstruction can be represented mathematically by the convolution of the object function with a PSF. By computing the functional form of the PSF, we can deconvolve, thus reconstructing the object.

[1039] The PSF can be computed by back-projecting the projections obtained from a single radially symmetric point located at (0, 0) described analytically by a two-dimensional delta function. The projection of a two-dimensional delta function, is a one-dimensional delta function and so in this case, we have,

P(x cos θ+y sin θ, θ)=δ(x cos θ+y sin θ), ∀θ

[1040] To compute the back-projection function, it is convenient to use a polar co-ordinate system. Thus, writing the above equation in (r, θ′) co-ordinates (i.e. writing x=r cos θ′ and y=r sin θ′) and substituting the result into equation (5.6), we obtain ${B\left( {r,\theta} \right)} = {{\frac{1}{2\pi}{\int_{0}^{\pi}{{\delta \left\lbrack {r\quad {\cos \left( {\theta - \theta^{\prime}} \right)}} \right\rbrack}{\theta^{\prime}}}}} = \frac{1}{r}}$

[1041] Hence, the PSF is given by ${P\left( {x,y} \right)} = \frac{1}{\sqrt{x^{2} + y^{2}}}$

[1042] The back-projection function obtained from the sequence of projections taken through an object function ƒ is therefore given by

B(x, y)=P(x, y)

ƒ(x, y)

[1043] In order to reconstruct ƒ from B we must deconvolve. This can be done by processing the equation above in Fourier space. Denoting the two-dimensional Fourier transform operator by {circumflex over (F)}₂, and using the convolution theorem, we can write

{tilde over (B)}(k _(x) , k _(y))={tilde over (P)}(k _(x) , k _(y)){tilde over (ƒ)}(k _(x) , k _(y))

[1044] where

{tilde over (ƒ)}(k _(x) , k _(y))={circumflex over (F)} ₂ƒ(x, y)

{tilde over (P)}(k _(x) , k _(y))={circumflex over (F)} ₂ P(x, y)

[1045] and

{tilde over (B)}(k _(x) , k _(y))={circumflex over (F)} ₂ B(x, y)

[1046] Rearranging ${\overset{\sim}{f}\left( {k_{x,}k_{y}} \right)} = \frac{\overset{\sim}{B}\left( {k_{x,}k_{y}} \right)}{\overset{\sim}{P}\left( {k_{x,}k_{y}} \right)}$

[1047] The function 1/{tilde over (P)} is called the inverse filter and can fortunately be computed analytically. The result is ${P\left( {k_{x,}k_{y}} \right)} = \frac{1}{\sqrt{k_{x}^{2} + k_{y}^{2}}}$

[1048] Hence, we arrive at the following reconstruction formula for the object function

ƒ(x, y)={circumflex over (F)} ₂ ⁻¹ [|k |{tilde over (B)}(k _(x) , k _(y))]

[1049] where k is the two-dimensional spatial frequency vector (k={circumflex over (x)}k_(x)+ŷk_(y)). Unfiltered back-projection produces a reconstruction which can be considered to be a blurred low-pass filtered image of the object function due to the poor transmission of high spatial frequencies. Deconvolution amplifies the high spatial frequencies inherent in the back-projection function.

[1050] Reconstruction Using the Projection Slice Theorem

[1051] The two-dimensional version of the Projection Slice Theorem (also known as the Central Slice Theorem) provides a relationship between the Radon transform of an object and its two dimensional Fourier transform. The theorem shows that the one dimensional Fourier transform of a projection at a given angle θ is equal to the function obtained by taking a radial slice through the two dimensional Fourier domain of the object at the same angle θ.

[1052] The proof of the central slice theorem comes from analysing the two-dimensional Fourier transform of an object function ƒ(r) given by

{tilde over (ƒ)}(k{circumflex over (n)})={circumflex over (F)} ₂ƒ(r)=∫ƒ(r) exp(−ik{circumflex over (n)}·r)d ² r

[1053] Substituting the result

exp(−ik{circumflex over (n)}·r)=∫exp(−-ikz)δ(z−{circumflex over (n)}·r)dz

[1054] into this equation, and changing the order of integration, we obtain ${\overset{\sim}{f}\left( {k\hat{n}} \right)} = {{\int{{f(r)}{\int{{z}\quad {\exp \left( {{- }\quad {kz}} \right)}{\delta \left( {z - {\hat{n} \cdot r}} \right)}d^{2}r}}}} = {\int{{z}\quad {\exp \left( {{- }\quad {kz}} \right)}{\int{{f(r)}{\delta \left( {z - {\hat{n} \cdot r}} \right)}d^{2}r}}}}}$

[1055] Observe, that the integral over r is just the Radon transform of ƒ and the integral over z is a one-dimensional Fourier transform. Using operator notation, we can write this result in the form

{tilde over (ƒ)}(k{circumflex over (n)})={circumflex over (F)} ₁ P({circumflex over (n)}, z)

[1056] or

{circumflex over (F)} ₂ƒ(r)={circumflex over (F)} ₁ P({circumflex over (n)}, z)

[1057] where

P({circumflex over (n)}, z)={circumflex over (R)}ƒ(r)

[1058] This theorem provides yet another way of reconstructing an object function from a set of its projections; a method which is compounded in the reconstruction formula

ƒ(r)={circumflex over (F)} ₂ ⁻¹ [{circumflex over (F)} ₁ P({circumflex over (n)}, z)]

[1059] 6. Summary

[1060] Deconvolution is concerned with the restoration of a signal or image from a recording which is resolution limited and corrupted by noise. This document has been concerned with a class of solutions to this problem which are based on different criteria for solving ill-posed problems (e.g. the least squares principle and the maximum entropy principle) in the case when the noise is additive.

[1061] Three cases have been discussed:

[1062] (i) The object is convolved with a Point Spread Function whose spectrum is continuous (e.g. a Gaussian Point Spread Function).

[1063] (ii) The object is convolved with a sinc Point Spread Function whose spectrum is discontinuous and consequently gives rise to a bandlimited image.

[1064] (iii) The image is reconstructed from a complete set of parallel projections.

[1065] Solutions to the first problem have been discussed which are based on the Wiener filter, Power Spectrum Equalization filter, the Matched filter and the Maximum Entropy Method. In addition, Bayesian estimation methods have been considered which rely on a priori information on the statistics (compounded in models for the Probability Density Function) of the noise function n_(ij) and object function ƒ_(ij). The Maximum Likelihood and Maximum a Posteriori methods are both forms of Bayesian estimation. In this report, only Gaussian statistics have been considered to illustrate the principles involved.

[1066] In all cases, knowledge of the characteristic function of the imaging system (i.e. the Point Spread Function) is required together with an estimate of the signal to noise ratio (SNR). The success of these methods depends on both the accuracy of the Point Spread Function and the SNR value used. An optimum restoration is then obtained by experimenting with different values of SNR for a given Point Spread Function.

[1067] In some cases, the PSF may either be difficult to obtain experimentally or simply not available. In such cases, it must be estimated from the data alone. This is known as ‘Blind Deconvolution’. If it is known a priori that the spectrum of the object function is ‘white’ (i.e. the average value of each Fourier component is roughly the same over the entire frequency spectrum), then any large scale variations in the recorded spectrum should be due to the frequency distribution of the PSF. By smoothing the data spectrum, an estimate of the instrument function can be established. This estimate may then be used to deconvolve the data by employing an appropriate filter.

[1068] The optimum value of the SNR when applied to the Wiener filter for example, can be obtained by searching through a range of values and for each restored image, computing the ratio of the magnitude of the digital gradient to the number of zero crossing's. This ratio is based on the idea that the optimum restoration is one which provides a well focused image with minimal ringing.

[1069] The problem of reconstructing a bandlimited function from limited Fourier data is an ill-posed problem. Hence, practical digital techniques for solving this problem tend to rely on the use of a priori information to limit the class of possible solutions. In this report, the least squares principle has been used as the basis for a solution and then modified to incorporate a priori information and provide a data consistent result. In this sense, the algorithm derived belongs to the same class as the Wiener filter and like the Wiener filter ultimately relies on the experience and intuition of a user for optimization.

[1070] Section 5 of this report discussed the problem of reconstruction from projections—a problem which is compounded in the forward and inverse Radon transform. Three types of reconstruction techniques have been derived, namely, back-projection and deconvolution, filtered back-projection and reconstruction using the central slice theorem. Although this problem is more specialised compared to deconvolution in general, it is still an important area of imaging science and has therefore been included for completeness. In addition, the Radon transform together with the Hough transform (a derivative of the Radon transform) is being used for image processing in general, in particular, for computer vision.

[1071] A detailed discussion of the computer implementation of the techniques discussed is beyond the scope of this work. However, Appendix B provides some example C-code for the 2D FFT, convolution and the Wiener filter which is provided to give the reader some additional appreciation of the software used to implement the results (i.e. filters) derived.

[1072] All the methods of restoration and reconstruction discussed here are based on the fundamental imaging equation

s=p

ƒ+n

[1073] which is a stationary model where the (blurring) effect of the PSF on the object function is the same at all locations on the ‘object plane’. In some cases, a stationary model is not a good approximation for s. Non-stationary models (in which the value of functional form of p changes with position) cannot use the methods discussed to restore/reconstruct a digital image. The basic reason for this is that the convolution theorem for a non-stationary convolution operation does not apply. However, it is possible to write out a (discrete) non-stationary convolution in terms of a matrix operation. The non-stationary deconvolution problem is then reduced to solving a large set of linear equations, the characteristic matrix being determined by the variable PSF. Another approach is to partition the image into regions in which a stationary model can be applied and deconvolve for each partition separately.

[1074] 7. Further Reading

[1075] Andrews H C and Hunt B R, Digital Image Reconstruction, Prentice-Hall, 1977

[1076] Bates R H T and McDonnel M J, Image Restoration and Reconstruction, Oxford Science Publications, 1986.

[1077] Deans S R, The Radon Transform and some of its Applications, Wiley-Interscience, 1983.

[1078] Rosenfeld A and Kak A C, Digital Picture Processing, Academic Press, 1980.

[1079] Sanz J L C, Hinkle E B and Jain A K, Radon and Projection Transform-Based Computer Vision, Springer-Verlag, 1988.

[1080] Appendix A: The Least Squares Method, the Orthogonality Principle, Norms and Hilbert Spaces

[1081] The least squares method and the orthogonality principle are used extensively in signal and image processing. This appendix has been written to provide supplementary material which would be out of context in the main body of this report.

[1082] The Least Squares Principle

[1083] Suppose, we have a real function ƒ(x) which we want to approximate by a function {circumflex over (ƒ)}(x). We can choose to construct {circumflex over (ƒ)} in such a way that its functional behaviour can be controlled be adjusting the value of a parameter a say. We can then adjust the value of a to find the best estimate {circumflex over (ƒ)} of ƒ. What is the best value of a to choose?

[1084] To solve this problem, we can construct the mean square error

e=∫[ƒ(x)−{circumflex over (ƒ)}(x, a)]² dx

[1085] where the integral is over the spatial support of ƒ(x). This error is a function of a. The value of a which produces the best approximation {circumflex over (ƒ)} of ƒ is therefore the one where e(a) is a minimum. Hence, a must be chosen so that $\frac{\partial e}{\partial a} = 0$

[1086] Substituting the expression for e into the above equation and differentiating we obtain ${\int{\left\lbrack {{f(x)} - {\hat{f}\left( {x,a} \right)}} \right\rbrack \frac{\partial}{\partial a}{\hat{f}\left( {x,a} \right)}{x}}} = 0$

[1087] Solving this equation for {circumflex over (ƒ)} provides the minimum mean square estimate for ƒ. This method is known generally as the least squares principle.

[1088] Linear Polynomial Models

[1089] To use the least squares principle, some sought of model for the estimate {circumflex over (ƒ)} must be introduced. Suppose we expand {circumflex over (ƒ)} in terms of a linear combination of (known) basis functions y_(n)(x), i.e. ${\hat{f}(x)} = {\sum\limits_{n}^{\quad}\quad {a_{n}{y_{n}(x)}}}$

[1090] For simplicity, let us first assume that ƒ is real. Since, the basis functions are known, to compute {circumflex over (ƒ)} the coefficients a_(n) must be found. Using the least squares principle, we require a_(n) such that the mean square error $e = {\int{\left( {{f(x)} - {\sum\limits_{n}^{\quad}\quad {a_{n}{y_{n}(x)}}}} \right)^{2}{x}}}$

[1091] is a minimum. This occurs when $\frac{\partial e}{\partial a_{m}} = {0{\forall m}}$

[1092] Differentiating, ${\frac{\partial}{\partial a_{m}}{\int{\left( {{f(x)} - {\sum\limits_{n}^{\quad}\quad {a_{n}{y_{n}(x)}}}} \right)^{2}{x}}}} = {2{\int{\left( {{f(x)} - {\sum\limits_{n}^{\quad}\quad {a_{n}{y_{n}(x)}}}} \right)\frac{\partial}{\partial a_{m}}\left( {{f(x)} - {\sum\limits_{n}^{\quad}\quad {a_{n}{y_{n}(x)}}}} \right){x}}}}$

[1093] Now ${\frac{\partial}{\partial a_{m}}\left( {{f(x)} - {\sum\limits_{n}^{\quad}\quad {a_{n}{y_{n}(x)}}}} \right)} = {{- \frac{\partial}{\partial a_{m}}}{\sum\limits_{n}^{\quad}\quad {a_{n}{y_{n}(x)}}}}$ $\begin{matrix} {\quad {= {{- \frac{\partial}{\partial a_{m}}}\left( {\ldots + {a_{1}{y_{1}(x)}} + {a_{2}{y_{2}(x)}} + \ldots + {a_{n}{y_{n}(x)}} + \ldots} \right)}}} \\ {{= {- {y_{1}(x)}}},{m = 1}} \\ {{= {- {y_{2}(x)}}},{m = 2}} \\ {\quad \vdots} \\ {{= {- {y_{m}(x)}}},{m = n}} \end{matrix}$

[1094] Hence, $\frac{\partial e}{\partial a_{m}} = {{{- 2}{\int{\left( {{f(x)} - {\sum\limits_{n}^{\quad}\quad {a_{n}{y_{n}(x)}}}} \right){y_{m}(x)}{x}}}} = 0}$

[1095] The coefficients a_(n) which minimize the mean square error for a linear polynomial model are therefore obtained by solving the equation $\int{{f(x)}{y_{m}(x)}{\sum\limits_{n}^{\quad}\quad {a_{n}{\int{{y_{n}(x)}{y_{m}(x)}{x}}}}}}$

[1096] for a_(n).

[1097] The Orthogonality Principle

[1098] The above result demonstrates that the coefficients a_(n) are such that the error ƒ−{circumflex over (ƒ)} is orthogonal to the basis functions y_(m). We can write this result in the form

ƒ−{circumflex over (ƒ)}, y _(m)

≡∫[ƒ(x)−{circumflex over (ƒ)}(x)]y _(m)(x)dx=0

[1099] This is known as the orthogonality principle.

[1100] Complex Functions, Norms and Hilbert Spaces

[1101] Consider the case when ƒ is a complex function. In this case, {circumflex over (ƒ)} must be a complex estimate of this function. We should therefore also assume that both y_(n) and a_(n) are complex for generality.

[1102] The mean square error should then be defined by $e = {\int{{{{f(x)} - {\sum\limits_{n}^{\quad}\quad {a_{n}{y_{n}(x)}}}}}^{2}{x}}}$

[1103] The operation

(∫|ƒ(x)|²dx)^(1/2)

[1104] defines the Euclidean norm of the function ƒ which is denoted by the sign ∥•∥. If ƒ is discrete and—sampled at points ƒ_(n) say, then the Euclidean norm is defined by $\left( {\sum\limits_{n}^{\quad}{f_{n}}} \right)^{\frac{1}{2}}$

[1105] Using this notation, we can write the mean square error in the form

e=∥ƒ(x)−{circumflex over (ƒ)}(x)∥²

[1106] which saves having to write integral signs (for piecewise continuous functional analysis) or summation signs (for discrete functional analysis) all the time. Note, there are many other definitions of norms which fall into the general classification ${{{f(x)}}_{p} = \left( {\int{{{f(x)}}^{p}{x}}} \right)^{\frac{1}{2}}},{p = 1},2,\ldots$

[1107] However, the Euclidean norm is one of the most useful and is the basis for least squares estimation methods in general.

[1108] The error function e is an example of a ‘Hilbert space’ which is a vector space. It is a function of the complex coefficients a_(n) and is a minimum when $\begin{matrix} {{\frac{\partial e}{\partial a_{m}^{r}} = 0}{and}} \\ {\frac{\partial e}{\partial a_{m}^{i}} = 0} \end{matrix}$

[1109] where

a_(m) ^(r)=Re[a_(m)]

[1110] and

a_(m) ^(i)=Im[a_(m)]

[1111] The above conditions lead to the result ${\int{\left( {{f(x)} - {\sum\limits_{\quad n}^{\quad}\quad {a_{n}{y_{n}(x)}}}} \right){y_{m}^{*}(x)}{x}}} = 0$

[1112] or

ƒ−{circumflex over (ƒ)}, y* _(m)

=0

[1113] which follows from the analysis below: $\begin{matrix} \quad & \quad \\ {{e = {\int\left| {f - {\sum\limits_{\quad n}^{\quad}\quad {\left( {a_{n}^{r} + {ia}_{n}^{i}} \right)y_{n}}}} \middle| {}_{2}{{x}{\int\left( {f - {\sum\limits_{\quad n}^{\quad}\quad {\left( {a_{n}^{r} + {ia}_{n}^{i}} \right)y_{n}}}} \right)}} \right.}}{{\left( {f^{*} - {\sum\limits_{\quad n}^{\quad}{\left( {a_{n}^{r} - {ia}_{n}^{i}} \right)\quad y_{n}^{*}}}} \right){x}\frac{\partial e}{\partial a_{m}^{r}}} = {{{\int{\left( {f - {\sum\limits_{\quad n}^{\quad}\quad {\left( {a_{n}^{r} + {ia}_{n}^{i}} \right)y_{n}}}} \right)y_{m}^{*}{x}}} - {\int{\left( {f^{*} - {\sum\limits_{\quad n}^{\quad}\quad {\left( {a_{n}^{r} + {ia}_{n}^{i}} \right)y_{n}^{*}}}} \right)y_{m}{x}}}} = 0}}} & ({A1}) \\ {\frac{\partial e}{\partial a_{m}^{i}} = {{{i{\int{\left( {f - {\sum\limits_{\quad n}^{\quad}\quad {\left( {a_{n}^{r} + {ia}_{n}^{i}} \right)y_{n}}}} \right)y_{m}^{*}{x}}}} - {\int{\left( {f^{*} - {\sum\limits_{\quad n}^{\quad}\quad {\left( {a_{n}^{r} - {ia}_{n}^{i}} \right)y_{n}^{*}}}} \right)y_{m}{x}}}} = {{{0{\int{\left( {f - {\sum\limits_{\quad n}^{\quad}\quad {\left( {a_{n}^{r} + {ia}_{n}^{i}} \right)y_{n}}}} \right)y_{m}^{*}{x}}}} - \quad {\int_{\quad}^{\quad}{\left( {f^{*} - {\sum\limits_{n}^{\quad}\quad {\left( {a_{n}^{r} - {ia}_{n}^{i}} \right)y_{n}^{*}}}} \right)y_{m}\quad {x}}}} = 0}}} & ({A2}) \end{matrix}$

[1114] Equation (A2) minus equation (A1) gives ${\int{\left( {f - {\sum\limits_{\quad n}^{\quad}\quad {\left( {a_{n}^{r} + {ia}_{n}^{i}} \right)y_{n}}}} \right)y_{m}^{*}{x}}} = 0$

[1115] or ${\int{\left( {f - {\sum\limits_{\quad n}^{\quad}\quad {a_{n}y_{n}}}} \right)y_{m}^{*}{x}}} = 0$

[1116] Linear Convolution Models

[1117] So far, we have demonstrated the least squares principle for approximating a function using a model for the estimate {circumflex over (ƒ)} of the form ${\hat{f}(x)} = {\sum\limits_{\quad n}^{\quad}\quad {a_{n}{y_{n}(x)}}}$

[1118] Another model which has a number of important applications is the linear convolution model

{circumflex over (ƒ)}(x)=y(x)

a(x)

[1119] In this case, the least squares principle can again be used to find the function a. A simple way to show how this can be done is to demonstrate the technique for digital signals and then use a limiting argument for continuous functions.

[1120] Real Discrete Functions—Digital Signals

[1121] If ƒ_(i) is a real discrete function, i.e. a vector consisting of a set of numbers ƒ₁, ƒ₂, ƒ₃, . . . etc., then we may use a linear convolution model for the discrete estimate {circumflex over (ƒ)}_(i) given by ${\hat{f}}_{i} = {\sum\limits_{\quad j}^{\quad}\quad {y_{i - j}a_{j}}}$

[1122] In this case, using the least squares principle, we find a_(i) by minimizing the mean square error $e = {\sum\limits_{\quad i}^{\quad}\quad \left( {f_{i} - {\hat{f}}_{i}} \right)^{2}}$

[1123] This error is a minimum when ${{\frac{\partial\quad}{\partial a_{k}}{\sum\limits_{\quad i}^{\quad}\quad \left( {f_{i} - {\sum\limits_{\quad j}^{\quad}{y_{i - j}a_{j}}}} \right)^{2}}} = 0}\quad$

[1124] Differentiating, we get ${{- 2}{\sum\limits_{\quad I}^{\quad}\quad {\left( {f_{i} - {\sum\limits_{\quad j}^{\quad}\quad {y_{i - j}a_{j}}}} \right)\frac{\partial\quad}{\partial a_{k}}{\sum\limits_{\quad j}^{\quad}\quad {y_{i - j}a_{j}}}}}} = {{{- 2}{\sum\limits_{\quad i}^{\quad}\quad {\left( {f_{i} - {\sum\limits_{\quad j}^{\quad}\quad {y_{i - j}a_{j}}}} \right)y_{i - k}}}} = 0}$

[1125] and rearranging, we have ${\sum\limits_{i}^{\quad}\quad {f_{i}y_{i - k}}} = {\sum\limits_{i}^{\quad}{\left( {\sum\limits_{j}^{\quad}{y_{i - j}a_{j}}} \right)y_{i - k}}}$

[1126] The left hand side of this equation is just the discrete correlation of ƒ_(i) with y_(i) and the right hand side is a correlation of y_(i) with $\sum\limits_{j}^{\quad}{y_{i - j}a_{j}}$

[1127] which is itself just a discrete convolution of y_(i) with a_(i). Hence, using the appropriate symbols we can write this equation as

ƒ_(i) ⊙y _(i)=(y _(i)

a _(i))⊙y _(i)

[1128] Real Continuous Functions—Analogue Signals

[1129] For continuous functions, the optimum function a which minimizes the mean square error

e=∫[ƒ(x)−{circumflex over (ƒ)}(x)]² dx

[1130] where

{circumflex over (ƒ)}(x)=a(x)

y(x)

[1131] is obtained by solving the equation

[ƒ(x)−a(x)

y(x)]⊙y(x)=0

[1132] This result is based on extending the result derived above for digital signals to infinite sums and using a limiting argument to integrals.

[1133] Complex Digital Signals

[1134] If the data are a elements of a complex discrete function ƒ_(i) where ƒ_(i) corresponds to a set of complex numbers ƒ₁, ƒ₂, ƒ₃, . . . , then we use the mean square error defined by $e = {\sum\limits_{i}^{\quad}{{f_{i} - {\hat{f}}_{i}}}^{2}}$

[1135] and a linear convolution model of the form ${\hat{f}}_{i} = {\sum\limits_{j}^{\quad}{y_{i - j}a_{j}}}$

[1136] In this case, the error is a minimum when $\frac{\partial e}{\partial a_{k}^{\prime}} = {{\sum\limits_{i}^{\quad}{\left( {f_{i} - {\sum\limits_{j}^{\quad}{y_{i - j}a_{i}}}} \right)y_{i - k}^{*}}} = 0}$

[1137] or

ƒ_(i) ⊙y* _(i)=(y _(i)

a _(i))⊙y* _(i)

[1138] Complex Analogue Signals

[1139] If {circumflex over (ƒ)}(x) is a complex estimate given by

{circumflex over (ƒ)}(x)=a(x)

y(x)

[1140] then the function a(x) which minimizes the error

e=∥ƒ(x)−{circumflex over (ƒ)}(x)∥²

[1141] is given by solving the equation

[ƒ(x)−a(x)

y(x)]⊙(y*(x)=0

[1142] This result is just another version of the orthogonality principle.

[1143] Points on Notation

[1144] Note that in the work presented above, the symbols

and ⊙ have been used to denote convolution and correlation respectively for both continuous and discrete data. With discrete signals,

and ⊙ denote convolution and correlation sums respectively. This is indicated by the presence of subscripts on the appropriate functions. If subscripts are not present, then the functions in question are continuous and

and ⊙ are taken to denote convolution and correlation integrals respectively.

[1145] Two Dimensions

[1146] In two dimensions, the least squares method may also be used to approximate a function using the same methods that have been presented above. For example, suppose we wish to approximate the complex 2D function ƒ(x, y) using an estimate of the form ${\hat{f}\left( {x,y} \right)} = {\sum\limits_{n}^{\quad}{\sum\limits_{m}^{\quad}{a_{n\quad m}{\varphi_{n\quad m}\left( {x,y} \right)}}}}$

[1147] In this case, the mean square error is given by

e=∫∫|ƒ(x, y)−{circumflex over (ƒ)}(x, y)|² dxdy

[1148] Using the orthogonality principle, this error is a minimum when ${\int{\int{\left\lbrack {{f\left( {x,y} \right)} - {\sum\limits_{n}^{\quad}{\sum\limits_{m}^{\quad}{a_{n\quad m}{\varphi_{n\quad m}\left( {x,y} \right)}}}}} \right\rbrack {\varphi_{pq}^{*}\left( {x,y} \right)}{x}{y}}}} = 0$

[1149] This is just a two dimensional version of the orthogonality principle. Another important linear model that is used for designing two dimensional digital filters is ${\hat{f}}_{ij} = {\sum\limits_{n}^{\quad}{\sum\limits_{m}^{\quad}{y_{{i - n},{j - m}}a_{n\quad m}}}}$

[1150] In this case, for complex data, the mean square error $e = {\sum\limits_{i}^{\quad}{\sum\limits_{j}^{\quad}{{f_{ij} - {\hat{f}}_{ij}}}^{2}}}$

[1151] is a minimum when ${\sum\limits_{i}^{\quad}{\sum\limits_{j}^{\quad}{\left( {f_{ij} - {\sum\limits_{n}^{\quad}{\sum\limits_{m}^{\quad}{y_{{i - n},{j - m}}a_{n\quad m}}}}} \right)y_{{i - p},{j - q}}^{*}}}} = 0$

[1152] Using the appropriate symbols we can write this equation in the form

ƒ_(ij) ⊙⊙y* _(ij)=(y _(ij)

a _(ij))⊙⊙y* _(ij)

[1153] For continuous functions, when

{circumflex over (ƒ)}(x, y)=y(x, y)

a(x, y)

[1154] the error

e=∫∫|ƒ(x, y)−{circumflex over (ƒ)}(x, y)|² dxdy

[1155] is a minimum when

[ƒ(x, y)−a(x, y)

y(x, y)]⊙⊙y*(x, y)=0

SECTION 5

[1156] Title: “Predictive Apparatus and Method”

[1157] THIS INVENTION relates to apparatus including a computer, for predicting trends and outcomes in fields involving phenomena which, in fractal terms, are statistically self-affine.

[1158] WO99/17260, the disclosure of which is incorporated herein by reference, incorporates a discussion of fractal concepts applied to the statistics of phenomena having a significant random or pseudo-random component and provides a mathematical treatment of a technique for imposing a so-called fractal modulation upon such phenomena whereby information can be encoded in, for example, a printed image on documents such as banknotes, in such a way as not to be apparent upon ordinary visual inspection or scrutiny, and likewise provides a mathematical treatment of a corresponding technique for a converse demodulation process by means of which such information can be recovered from the printed image, for example to verify the authenticity of the document.

[1159] The inventors in respect of the present application have discovered that similar fractal statistical demodulation techniques can also be used in extracting useful information from “natural” phenomena which exhibit similarly statistically fractal characteristics and which, in particular, are statistically self-affine, in the sense in which that term is used in the mathematics of fractals.

[1160] According to one aspect of the invention, there is provided a method of deriving predictive information relating to phenomena which are statistically fractal in a time dimension, comprising analysing, by computer means, statistical data relating to such phenomena at different times, such computer means being arranged to execute a program such as to perform, on said data, mathematical processes based upon fractal demodulation, in order to derive predictive information relating to the phenomena.

[1161] According to another aspect of the invention, there is provided apparatus for deriving predictive information relating to statistically fractal information, including a computer programmed to perform, on said data, mathematical processes based upon fractal demodulation, to derive predictive information relating to the phenomena.

[1162] According to another aspect of the invention, there is provided a data carrier, such as a floppy disk or CD-ROM, carrying a program for a computer whereby a computer programmed with the program may carry out the method of the invention.

[1163] The said mathematical processes based upon fractal demodulation may be or include mathematical processes disclosed in WO99/17260.

[1164] The computer program concerned may suitably incorporate an algorithm or algorithms of general application to phenomena of a broad class. The applicants envisage that the program, in a simple form, may be arranged to apply, to the relevant data, two such general algorithms successively.

[1165] In one embodiment the method is applied to the analysis of financial data, for example relating to the stock market or commodity prices or the like, to provide a more reliable detection and prediction of economic trends. Thus, in accordance with this embodiment, it may be possible to detect the first signs of a market “crash” months before it would occur, and in time to allow financial institutions to take remedial action.

[1166] In another embodiment, the method is applied to the field of medicine. The inventors have discovered that epidemiological data on a geographical perspective is statistically self-affine, irrespective of the type of disease concerned, and is thus susceptible of study by the method and apparatus of the invention. Thus, the invention may provide a new tool in the study of cause and effect in matters of health.

[1167] The applicants believe that this approach will, in the future, be of significant value in the analysis of health care, and in allowing government expenditure on health care to be appropriately directed.

[1168] The applicants believe that the following areas of medicine are among those which will benefit from the invention:

[1169] 1. The analysis and comparison of genetic sequences with insertions and deletions.

[1170] 2. The analysis of the complex electrical patterns of the heart (cardiac arrhythmias) and brain (EMG recording).

[1171] 3. Epidemiology of infectious disease and targeting of vaccination, including epidemic prediction worldwide, particularly with regard to illnesses which are poorly understood such as B.S.E. and ME.

[1172] 4. Pharmacology. When analysing a new pharmaceutical product the importance of crucial data may not be appreciated when normal Gaussian statistical models are applied. Millions, if not billions, of dollars may be invested in a new drug's development. All this can be lost if the drug has to be unexpectedly withdrawn due to a side effect not predicted. If the invention could help predict these events the commercial benefit would be enormous.

[1173] 5. Engineering of raw pharmaceuticals. The way viruses, bacteria, cancerous cells, etc., evolve and mutate (cell dynamics) appears initially to be random. If this “natural selection” could be predicted a pharmaceutical response could be prepared. Examples would include Multidrug resistant TB, HIV treatment, and MRSA (a common resistant infection in hospitals). Drug resistance is an increasing problem. If the mutations made by bacterial could be predicted then treatment could be appropriately devised. Application of the invention to appropriate data may provide warning of a multi-drug resistant crisis and allow antibiotic use to be curtailed immediately, or other preventative measures to be taken.

[1174] The invention may, of course, be applied to analysis and prediction of events involving other phenomena exhibiting fractal, self-affine, behaviour. For example, the invention may be applied to weather forecasting or climatological forecasting etc.

[1175] In the present specification “comprises” means “includes or consists of” and “comprising” means “including or consisting of”.

[1176] The features disclosed in the foregoing description, or the following claims, or the accompanying drawings, expressed in their specific forms or in terms of a means for performing the disclosed function, or a method or process for attaining the disclosed result, as appropriate, may, separately, or in any combination of such features, be utilised for realising the invention in diverse forms thereof.

CONCLUDING SECTION

[1177] From the foregoing, it will be appreciated that some of the aspects of the invention disclosed are concerned with converting meaningful data (or plain text) to a chaotic or pseudo chaotic form (i.e. encrypted form) whilst other aspects disclosed are concerned with the interpretation of chaotic or seemingly chaotic data in such a way as to derive more meaningful information from it. Thus, for example, the invention in some of these other aspects allows day-to-day variation in hospital admissions for example to be interpreted so as to provide reliable predictions of future demand for hospital beds, or allows short-term variations in meteorological measurements to be interpreted to provide predictions of future weather or climate, or allows seemingly chaotic variations is histology slides to provide a screening of normal specimens from pathological or possibly pathological ones. 

1. A document, card, or the like, having an area adapted to receive a signature or other identifying marking, and which bears a two-dimensional coded marking adapted for reading by a complementary automatic reading device.
 2. A document, card, or the like in accordance with claim 1, wherein said area is adapted to have a signature written thereon.
 3. An automatic reading device for reading document, card, or the like in accordance with claim 2, and which is capable of reading the information coded into said two-dimensional coded marking and of comparing the modification of the coded marking due to a signature or other marking thereon with the effect of a signature which is the subject of stored data to which the reading device has access in order to determine whether the signature on said area is that of the individual whose signature is the subject of that stored data.
 4. A data encryption system utilising fractal principles.
 5. A data encryption system utilising the mathematics of chaos.
 6. A data encryption system which uses a chaotic or pseudo-random key and a camouflage encryption key.
 7. A method of processing a section of video “footage” to produce a “still” view of higher visual quality than the individual frames of that footage, comprising sampling, over a plurality of video “frames”, image quantities (such as brightness and hue or colour) for corresponding points over such frames, and processing the samples to produce a high quality “still” frame.
 8. A method according to claim 7 wherein said processing includes utilisation of one or more of the techniques disclosed in Part 2 of this specification, including the appendices thereto.
 9. A method according to claim 7 or claim 8 wherein an initial stage of said processing comprises averaging said image quantities for such corresponding image points over a plurality of video frames to produce an average frame, utilised in subsequent processing steps.
 10. Apparatus for processing a section of video footage to produce a “still” view of higher visual quality than the individual frames of that footage, the apparatus comprising means for receiving data in digital form corresponding to said frames, processing means for processing such data and producing digital data corresponding to an enhanced image based on such individual frames, and means for displaying or printing said enhanced image.
 11. Apparatus according to claim 10 wherein said processing means is programmed to execute one or more of the processing techniques disclosed in Part 2 of this specification, including the appendices thereto.
 12. A method of deriving predictive information relating to phenomena which are statistically fractal in a time dimension, comprising analysing, by computer means, statistical data relating to such phenomena at different times, such computer means being arranged to execute a program such as to perform, on said data, mathematical processes based upon fractal demodulation, in order to derive predictive information relating to the phenomena.
 13. Apparatus for deriving predictive information relating to statistically fractal information, including a computer programmed to perform, on said data, mathematical processes based upon fractal demodulation, to derive predictive information relating to the phenomena.
 14. A data carrier, such as a floppy disk or CD-ROM, carrying a program for a computer whereby a computer programmed with the program may carry out the method of the invention. 